date:20210411

Re: [Question] Binaries of virtio-gpu-wddm-dod?

2021-04-11 Thread Vadim Rozenfeld

On Fri, 2021-04-09 at 09:27 -0400, Mike Ladouceur wrote:
> Hi, I'm wondering where I can find binaries of virtio-gpu-wddm-dod to
> test? I tried to build but I guess I'm running too new a version of
> Windows or VS/SDK/WDK? I've seen mention of prewhql ISO's with
> binaries but there's never any links? I understand it's in
> development phase. Thanks!

Hi Mike,

I'm going to update 
https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/
latest  with build 196 soon.

Best,
Vadim.

Re: [PULL 0/2] x86 and CPU bug fixes for 6.0-rc3

2021-04-11 Thread Peter Maydell

On Fri, 9 Apr 2021 at 21:22, Eduardo Habkost  wrote:
>
> The following changes since commit 471387aa1446e2583f372f79327cc0a8c802b4b4:
>
>   Merge remote-tracking branch 'remotes/cohuck-gitlab/tags/s390x-20210409' 
> into staging (2021-04-09 17:21:18 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/ehabkost/qemu.git tags/x86-next-pull-request
>
> for you to fetch changes up to 0b47ec4b95ad1952e55e639711d442f8ec6e1345:
>
>   cpu/core: Fix "help" of CPU core device types (2021-04-09 16:05:16 -0400)
>
> 
> x86 and CPU bug fixes for 6.0-rc3
>
> * Add missing features to EPYC-Rome CPU model (Babu Moger)
> * Fix crash with "-device ...-cpu-core,help" (Greg Kurz)
>
> 

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.

-- PMM

[RFC v9 01/29] hw/vfio/common: trace vfio_connect_container operations

2021-04-11 Thread Eric Auger

We currently trace vfio_disconnect_container() but we do not trace
the container <-> group creation, which can be useful to understand
the VFIO topology.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 3 +++
 hw/vfio/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ae5654fcdb..a456455517 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1742,6 +1742,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_FOREACH(container, &space->containers, next) {
 if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
 group->container = container;
+trace_vfio_connect_existing_container(group->groupid,
+  container->fd);
 QLIST_INSERT_HEAD(&container->group_list, group, container_next);
 vfio_kvm_device_add_group(group);
 return 0;
@@ -1775,6 +1777,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 if (ret) {
 goto free_container_exit;
 }
+trace_vfio_connect_new_container(group->groupid, container->fd);
 
 switch (container->iommu_type) {
 case VFIO_TYPE1v2_IOMMU:
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 079f53acf2..2a41326c0f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -104,6 +104,8 @@ vfio_listener_region_add_no_dma_map(const char *name, 
uint64_t iova, uint64_t si
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING 
region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" 
- 0x%"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
+vfio_connect_existing_container(int groupid, int container_fd) "group=%d 
existing container fd=%d"
+vfio_connect_new_container(int groupid, int container_fd) "group=%d new 
container fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int 
num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
-- 
2.26.3

[RFC v9 02/29] update-linux-headers: Import iommu.h

2021-04-11 Thread Eric Auger

Update the script to import the new iommu.h uapi header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 1050e36169..b1abafac3c 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -142,7 +142,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h iommu.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.26.3

[RFC v9 00/29] vSMMUv3/pSMMUv3 2 stage VFIO integration

2021-04-11 Thread Eric Auger

Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We use PCIPASIDOps to propage guest stage1 config changes on
  STE (Stream Table Entry) changes.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We register MSI IOVA/GPA bindings to the host so that this latter
  can build a nested stage translation
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  prereg memory listener.
- Physical SMMU stage 1 related faults are reported to the guest
  via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
  region. Then they are reinjected into the guest.

Best Regards

Eric

All the patches can be found at:
https://github.com/eauger/qemu/tree/v6.0.0-rc2-2stage-rfcv9

Previous version:
v8: https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8

Kernel Dependencies:
[1] [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
[2] [PATCH v13 00/13] SMMUv3 Nested Stage Setup (VFIO part)
branch containing both:
https://github.com/eauger/linux/tree/v5.12-rc6-jean-iopf-14-2stage-v15

History:

v8 -> v9:
- added
  hw/arm/smmu-common: Allow domain invalidation for NH_ALL/NSNH_ALL
  following Chenxiang's report

v7 -> v8:
- adapt to changes to the kernel uapi
- Fix unregistration of MSI bindings
- applies on top of range invalidation fixes
- changes in IOTLBEntry (flags)
- addressed all the comments from reviewers/testers I hope.
  Many thanks to all of you! see individual logs


Eric Auger (28):
  hw/vfio/common: trace vfio_connect_container operations
  update-linux-headers: Import iommu.h
  header update against 5.12-rc6 and IOMMU/VFIO nested stage APIs
  memory: Add new fields in IOTLBEntry
  hw/arm/smmuv3: Improve stage1 ASID invalidation
  hw/arm/smmu-common: Allow domain invalidation for NH_ALL/NSNH_ALL
  memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
  memory: Introduce IOMMU Memory Region inject_faults API
  iommu: Introduce generic header
  vfio: Force nested if iommu requires it
  vfio: Introduce hostwin_from_range helper
  vfio: Introduce helpers to DMA map/unmap a RAM section
  vfio: Set up nested stage mappings
  vfio: Pass stage 1 MSI bindings to the host
  vfio: Helper to get IRQ info including capabilities
  vfio/pci: Register handler for iommu fault
  vfio/pci: Set up the DMA FAULT region
  vfio/pci: Implement the DMA fault handler
  hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
  hw/arm/smmuv3: Store the PASID table GPA in the translation config
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
  hw/arm/smmuv3: Pass stage 1 configurations to the host
  hw/arm/smmuv3: Implement fault injection
  hw/arm/smmuv3: Allow MAP notifiers
  pci: Add return_page_response pci ops
  vfio/pci: Implement return_page_response page response callback

Liu Yi L (1):
  pci: introduce PCIPASIDOps to PCIDevice

 hw/arm/smmu-internal.h|   1 +
 hw/vfio/pci.h |  11 +
 include/exec/memory.h |  64 +-
 include/hw/arm/smmu-common.h  |   1 +
 include/hw/iommu/iommu.h  |  36 ++
 include/hw/pci/pci.h  |  15 +
 include/hw/vfio/vfio-common.h |  19 +
 include/standard-headers/drm/drm_fourcc.h |  23 +-
 include/standard-headers/linux/ethtool.h  |  54 +-
 include/standard-headers/linux/fuse.h |   3 +-
 include/standard-headers/linux/input.h|   2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|   7 +
 linux-headers/asm-generic/unistd.h|   4 +-
 linux-headers/asm-mips/unistd_n32.h   |   1 +
 linux-headers/asm-mips/unistd_n64.h   |   1 +
 linux-headers/asm-mips/unistd_o32.h   |   1 +
 linux-headers/asm-powerpc/kvm.h   |   2 +
 linux-headers/asm-powerpc/unistd_32.h |   1 +
 linux-headers/asm-powerpc/unistd_64.h |   1 +
 linux-headers/asm-s390/u

[RFC v9 04/29] memory: Add new fields in IOTLBEntry

2021-04-11 Thread Eric Auger

The current IOTLBEntry becomes too simple to interact with
some physical IOMMUs. IOTLBs can be invalidated with different
granularities: domain, pasid, addr. Current IOTLB entry only offers
page selective invalidation. Let's add a granularity field
that conveys this information.

TLB entries are usually tagged with some ids such as the asid
or pasid. When propagating an invalidation command from the
guest to the host, we need to pass those IDs.

Also we add a leaf field which indicates, in case of invalidation
notification, whether only cache entries for the last level of
translation are required to be invalidated.

A flag field is introduced to inform whether those fields are set.

To enforce all existing users do not use those new fields,
initialize the IOMMUTLBEvents when needed.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- add pasid, granularity and flags
---
 include/exec/memory.h| 36 +++-
 hw/arm/smmu-common.c |  2 +-
 hw/arm/smmuv3.c  |  2 +-
 hw/i386/intel_iommu.c|  6 +++---
 hw/ppc/spapr_iommu.c |  2 +-
 hw/virtio/virtio-iommu.c |  4 ++--
 6 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5728a681b2..94b9157249 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -75,14 +75,48 @@ typedef enum {
 IOMMU_RW   = 3,
 } IOMMUAccessFlags;
 
+/* Granularity of the cache invalidation */
+typedef enum {
+IOMMU_INV_GRAN_ADDR = 0,
+IOMMU_INV_GRAN_PASID,
+IOMMU_INV_GRAN_DOMAIN,
+} IOMMUInvGranularity;
+
 #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
 
+/**
+ * IOMMUTLBEntry - IOMMU TLB entry
+ *
+ * Structure used when performing a translation or when notifying MAP or
+ * UNMAP (invalidation) events
+ *
+ * @target_as: target address space
+ * @iova: IO virtual address (input)
+ * @translated_addr: translated address (output)
+ * @addr_mask: address mask (0xfff means 4K binding), must be multiple of 2
+ * @perm: permission flag of the mapping (NONE encodes no mapping or
+ * invalidation notification)
+ * @granularity: granularity of the invalidation
+ * @flags: informs whether the following fields are set
+ * @arch_id: architecture specific ID tagging the TLB
+ * @pasid: PASID tagging the TLB
+ * @leaf: when @perm is NONE, indicates whether only caches for the last
+ * level of translation need to be invalidated.
+ */
 struct IOMMUTLBEntry {
 AddressSpace*target_as;
 hwaddr   iova;
 hwaddr   translated_addr;
-hwaddr   addr_mask;  /* 0xfff = 4k translation */
+hwaddr   addr_mask;
 IOMMUAccessFlags perm;
+IOMMUInvGranularity granularity;
+#define IOMMU_INV_FLAGS_PASID  (1 << 0)
+#define IOMMU_INV_FLAGS_ARCHID (1 << 1)
+#define IOMMU_INV_FLAGS_LEAF   (1 << 2)
+uint32_t flags;
+uint32_t arch_id;
+uint32_t pasid;
+bool leaf;
 };
 
 /*
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 84d2c62c26..0ba3dca3b8 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -471,7 +471,7 @@ IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
 /* Unmap the whole notifier's range */
 static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 
 event.type = IOMMU_NOTIFIER_UNMAP;
 event.entry.target_as = &address_space_memory;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 3b87324ce2..d037d6df5b 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
uint8_t tg, uint64_t num_pages)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint8_t granule;
 
 if (!tg) {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6be8f32918..1c5b43f902 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1195,7 +1195,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 uint32_t offset;
 uint64_t slpte;
 uint64_t subpage_size, subpage_mask;
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint64_t iova = start;
 uint64_t iova_next;
 int ret = 0;
@@ -2427,7 +2427,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState 
*s,
   VTDInvDesc *inv_desc)
 {
 VTDAddressSpace *vtd_dev_as;
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 struct VTDBus *vtd_bus;
 hwaddr addr;
 uint64_t sz;
@@ -3483,7 +3483,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, 
IOMMUNotifier *n)
 size = remain = end - start + 1;
 
 while (remain >= VTD_PAGE_SIZE) {
-IOMMUTLBEvent event;
+IOMMUTLBEvent event = {};
 uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
 uint64_t size = mask + 1;
 
diff --gi

[RFC v9 03/29] header update against 5.12-rc6 and IOMMU/VFIO nested stage APIs

2021-04-11 Thread Eric Auger

Signed-off-by: Eric Auger 
---
 include/standard-headers/drm/drm_fourcc.h |  23 ++-
 include/standard-headers/linux/ethtool.h  |  54 +++---
 include/standard-headers/linux/fuse.h |   3 +-
 include/standard-headers/linux/input.h|   2 +-
 .../standard-headers/rdma/vmw_pvrdma-abi.h|   7 +
 linux-headers/asm-generic/unistd.h|   4 +-
 linux-headers/asm-mips/unistd_n32.h   |   1 +
 linux-headers/asm-mips/unistd_n64.h   |   1 +
 linux-headers/asm-mips/unistd_o32.h   |   1 +
 linux-headers/asm-powerpc/kvm.h   |   2 +
 linux-headers/asm-powerpc/unistd_32.h |   1 +
 linux-headers/asm-powerpc/unistd_64.h |   1 +
 linux-headers/asm-s390/unistd_32.h|   1 +
 linux-headers/asm-s390/unistd_64.h|   1 +
 linux-headers/asm-x86/kvm.h   |   1 +
 linux-headers/asm-x86/unistd_32.h |   1 +
 linux-headers/asm-x86/unistd_64.h |   1 +
 linux-headers/asm-x86/unistd_x32.h|   1 +
 linux-headers/linux/kvm.h |  89 +
 linux-headers/linux/vfio.h| 169 +-
 20 files changed, 337 insertions(+), 27 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index c47e19810c..a61ae520c2 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -526,6 +526,25 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS fourcc_mod_code(INTEL, 7)
 
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for Gen-12 render
+ * compression.
+ *
+ * The main surface is Y-tiled and is at plane index 0 whereas CCS is linear
+ * and at index 1. The clear color is stored at index 2, and the pitch should
+ * be ignored. The clear color structure is 256 bits. The first 128 bits
+ * represents Raw Clear Color Red, Green, Blue and Alpha color each represented
+ * by 32 bits. The raw clear color is consumed by the 3d engine and generates
+ * the converted clear color of size 64 bits. The first 32 bits store the Lower
+ * Converted Clear Color value and the next 32 bits store the Higher Converted
+ * Clear Color value when applicable. The Converted Clear Color values are
+ * consumed by the DE. The last 64 bits are used to store Color Discard Enable
+ * and Depth Clear Value Valid which are ignored by the DE. A CCS cache line
+ * corresponds to an area of 4x1 tiles in the main surface. The main surface
+ * pitch is required to be a multiple of 4 tile widths.
+ */
+#define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -1035,9 +1054,9 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t 
modifier)
  * Not all combinations are valid, and different SoCs may support different
  * combinations of layout and options.
  */
-#define __fourcc_mod_amlogic_layout_mask 0xf
+#define __fourcc_mod_amlogic_layout_mask 0xff
 #define __fourcc_mod_amlogic_options_shift 8
-#define __fourcc_mod_amlogic_options_mask 0xf
+#define __fourcc_mod_amlogic_options_mask 0xff
 
 #define DRM_FORMAT_MOD_AMLOGIC_FBC(__layout, __options) \
fourcc_mod_code(AMLOGIC, \
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 8bfd01d230..8e166b3c49 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -26,6 +26,14 @@
  * have the same layout for 32-bit and 64-bit userland.
  */
 
+/* Note on reserved space.
+ * Reserved fields must not be accessed directly by user space because
+ * they may be replaced by a different field in the future. They must
+ * be initialized to zero before making the request, e.g. via memset
+ * of the entire structure or implicitly by not being set in a structure
+ * initializer.
+ */
+
 /**
  * struct ethtool_cmd - DEPRECATED, link control and status
  * This structure is DEPRECATED, please use struct ethtool_link_settings.
@@ -67,6 +75,7 @@
  * and other link features that the link partner advertised
  * through autonegotiation; 0 if unknown or not applicable.
  * Read-only.
+ * @reserved: Reserved for future use; see the note on reserved space.
  *
  * The link speed in Mbps is split between @speed and @speed_hi.  Use
  * the ethtool_cmd_speed() and ethtool_cmd_speed_set() functions to
@@ -155,6 +164,7 @@ static inline uint32_t ethtool_cmd_speed(const struct 
ethtool_cmd *ep)
  * @bus_info: Device bus address.  This should match the dev_name()
  * string for the underlying bus device, if there is one.  May be
  * an empty string.
+ * @reserved2: Reserved for future use; see the note on reserved space.
  * @n_priv_flags: Number of flags valid for %ETHTOOL_GPFLAGS and
  * %ETHTOOL_SPFLAGS commands; also the number of strings in the
  * %ETH_SS_PRIV_FLAGS set
@@ -356,6 +366,7 @@ struct ethtool_eeprom {

[RFC v9 06/29] hw/arm/smmu-common: Allow domain invalidation for NH_ALL/NSNH_ALL

2021-04-11 Thread Eric Auger

NH_ALL/NSNH_ALL corresponds to a domain granularity invalidation,
ie. all the notifier range gets invalidation, whatever the ASID.
So let's set the granularity to IOMMU_INV_GRAN_DOMAIN to allow
the consumer to benefit from the info if it can.

Signed-off-by: Eric Auger 
Suggested-by: chenxiang (M) 
---
 hw/arm/smmu-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 0ba3dca3b8..c33d03de67 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -478,6 +478,7 @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 event.entry.iova = n->start;
 event.entry.perm = IOMMU_NONE;
 event.entry.addr_mask = n->end - n->start;
+event.entry.granularity = IOMMU_INV_GRAN_DOMAIN;
 
 memory_region_notify_iommu_one(n, &event);
 }
-- 
2.26.3

[RFC v9 11/29] pci: introduce PCIPASIDOps to PCIDevice

2021-04-11 Thread Eric Auger

From: Liu Yi L 

This patch introduces PCIPASIDOps for IOMMU related operations.

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00940.html

So far, to setup virt-SVA for assigned SVA capable device, needs to
configure host translation structures for specific pasid. (e.g. bind
guest page table to host and enable nested translation in host).
Besides, vIOMMU emulator needs to forward guest's cache invalidation
to host since host nested translation is enabled. e.g. on VT-d, guest
owns 1st level translation table, thus cache invalidation for 1st
level should be propagated to host.

This patch adds two functions: alloc_pasid and free_pasid to support
guest pasid allocation and free. The implementations of the callbacks
would be device passthru modules. Like vfio.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
---
 include/hw/pci/pci.h | 11 +++
 hw/pci/pci.c | 34 ++
 2 files changed, 45 insertions(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6be4e0c460..1f73c04975 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -9,6 +9,7 @@
 
 #include "hw/pci/pcie.h"
 #include "qom/object.h"
+#include "hw/iommu/iommu.h"
 
 extern bool pci_available;
 
@@ -265,6 +266,11 @@ struct PCIReqIDCache {
 };
 typedef struct PCIReqIDCache PCIReqIDCache;
 
+struct PCIPASIDOps {
+int (*set_pasid_table)(PCIBus *bus, int32_t devfn, IOMMUConfig *config);
+};
+typedef struct PCIPASIDOps PCIPASIDOps;
+
 struct PCIDevice {
 DeviceState qdev;
 bool partially_hotplugged;
@@ -360,6 +366,7 @@ struct PCIDevice {
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
 uint32_t acpi_index;
+PCIPASIDOps *pasid_ops;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -491,6 +498,10 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, 
int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn);
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn, IOMMUConfig 
*config);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8f35e13a0c..114855a0ac 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2731,6 +2731,40 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void 
*opaque)
 bus->iommu_opaque = opaque;
 }
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
+{
+assert(ops && !dev->pasid_ops);
+dev->pasid_ops = ops;
+}
+
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return false;
+}
+
+dev = bus->devices[devfn];
+return !!(dev && dev->pasid_ops);
+}
+
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn,
+   IOMMUConfig *config)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return -EINVAL;
+}
+
+dev = bus->devices[devfn];
+if (dev && dev->pasid_ops && dev->pasid_ops->set_pasid_table) {
+return dev->pasid_ops->set_pasid_table(bus, devfn, config);
+}
+return -ENOENT;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
 Range *range = opaque;
-- 
2.26.3

[RFC v9 09/29] memory: Introduce IOMMU Memory Region inject_faults API

2021-04-11 Thread Eric Auger

This new API allows to inject @count iommu_faults into
the IOMMU memory region.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 24 
 softmmu/memory.c  | 10 ++
 2 files changed, 34 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ac8521b29a..527f77c453 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -65,6 +65,8 @@ struct ReservedRegion {
 unsigned type;
 };
 
+struct iommu_fault;
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
@@ -475,6 +477,19 @@ struct IOMMUMemoryRegionClass {
  int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
  uint64_t page_size_mask,
  Error **errp);
+
+/*
+ * Inject @count faults into the IOMMU memory region
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_injection_faults() will return -ENOENT
+ *
+ * @iommu: the IOMMU memory region to inject the faults in
+ * @count: number of faults to inject
+ * @buf: fault buffer
+ */
+int (*inject_faults)(IOMMUMemoryRegion *iommu, int count,
+ struct iommu_fault *buf);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1520,6 +1535,15 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr);
 int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
uint64_t page_size_mask,
Error **errp);
+/**
+ * memory_region_inject_faults : inject @count faults stored in @buf
+ *
+ * @iommu_mr: the IOMMU memory region
+ * @count: number of faults to be injected
+ * @buf: buffer containing the faults
+ */
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf);
 
 /**
  * memory_region_name: get a memory region's name
diff --git a/softmmu/memory.c b/softmmu/memory.c
index d4493ef9e4..1dd34356c0 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2030,6 +2030,16 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 return imrc->num_indexes(iommu_mr);
 }
 
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+if (!imrc->inject_faults) {
+return -ENOENT;
+}
+return imrc->inject_faults(iommu_mr, count, buf);
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
2.26.3

[RFC v9 05/29] hw/arm/smmuv3: Improve stage1 ASID invalidation

2021-04-11 Thread Eric Auger

At the moment ASID invalidation command (CMD_TLBI_NH_ASID) is
propagated as a domain invalidation (the whole notifier range
is invalidated independently on any ASID information).

The new granularity field now allows to be more precise and
restrict the invalidation to a peculiar ASID. Set the corresponding
fields and flag.

We still keep the iova and addr_mask settings for consumers that
do not support the new fields, like VHOST.

Signed-off-by: Eric Auger 

---

v8 -> v9:
- restore the iova and addr_massk settings for consumers that do
  not support the new fields like VHOST
---
 hw/arm/smmuv3.c | 44 ++--
 hw/arm/trace-events |  1 +
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index d037d6df5b..a4436868ba 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -835,6 +835,31 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 memory_region_notify_iommu_one(n, &event);
 }
 
+/**
+ * smmuv3_notify_asid - call the notifier @n for a given asid
+ *
+ * @mr: IOMMU mr region handle
+ * @n: notifier to be called
+ * @asid: address space ID or negative value if we don't care
+ */
+static void smmuv3_notify_asid(IOMMUMemoryRegion *mr,
+   IOMMUNotifier *n, int asid)
+{
+IOMMUTLBEvent event = {};
+
+event.type = IOMMU_NOTIFIER_UNMAP;
+event.entry.target_as = &address_space_memory;
+event.entry.perm = IOMMU_NONE;
+event.entry.granularity = IOMMU_INV_GRAN_PASID;
+event.entry.flags = IOMMU_INV_FLAGS_ARCHID;
+event.entry.arch_id = asid;
+event.entry.iova = n->start;
+event.entry.addr_mask = n->end - n->start;
+
+memory_region_notify_iommu_one(n, &event);
+}
+
+
 /* invalidate an asid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova,
   uint8_t tg, uint64_t num_pages)
@@ -910,6 +935,22 @@ smmuv3_invalidate_ste(gpointer key, gpointer value, 
gpointer user_data)
 return true;
 }
 
+static void smmuv3_s1_asid_inval(SMMUState *s, uint16_t asid)
+{
+SMMUDevice *sdev;
+
+trace_smmuv3_s1_asid_inval(asid);
+QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
+IOMMUMemoryRegion *mr = &sdev->iommu;
+IOMMUNotifier *n;
+
+IOMMU_NOTIFIER_FOREACH(n, mr) {
+smmuv3_notify_asid(mr, n, asid);
+}
+}
+smmu_iotlb_inv_asid(s, asid);
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -1020,8 +1061,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 uint16_t asid = CMD_ASID(&cmd);
 
 trace_smmuv3_cmdq_tlbi_nh_asid(asid);
-smmu_inv_notifiers_all(&s->smmu_state);
-smmu_iotlb_inv_asid(bs, asid);
+smmuv3_s1_asid_inval(bs, asid);
 break;
 }
 case SMMU_CMD_TLBI_NH_ALL:
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index b79a91af5f..8e530ba79d 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -46,6 +46,7 @@ smmuv3_cmdq_cfgi_cd(uint32_t sid) "sid=0x%x"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid=0x%x (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid=0x%x (hits=%d, misses=%d, hit 
rate=%d)"
 smmuv3_s1_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, uint64_t 
num_pages, uint8_t ttl, bool leaf) "vmid=%d asid=%d addr=0x%"PRIx64" tg=%d 
num_pages=0x%"PRIx64" ttl=%d leaf=%d"
+smmuv3_s1_asid_inval(int asid) "asid=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
-- 
2.26.3

[RFC v9 18/29] vfio/pci: Register handler for iommu fault

2021-04-11 Thread Eric Auger

We use the new extended IRQ VFIO_IRQ_TYPE_NESTED type and
VFIO_IRQ_SUBTYPE_DMA_FAULT subtype to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.

The actual handler will be implemented in subsequent patches.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- index_to_str now returns the index name, ie. DMA_FAULT
- use the extended IRQ

v3 -> v4:
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
---
 hw/vfio/pci.h |  7 +
 hw/vfio/pci.c | 81 ++-
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 64777516d1..a8b06737fb 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -114,6 +114,12 @@ typedef struct VFIOMSIXInfo {
 unsigned long *pending;
 } VFIOMSIXInfo;
 
+typedef struct VFIOPCIExtIRQ {
+struct VFIOPCIDevice *vdev;
+EventNotifier notifier;
+uint32_t index;
+} VFIOPCIExtIRQ;
+
 #define TYPE_VFIO_PCI "vfio-pci"
 OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI)
 
@@ -138,6 +144,7 @@ struct VFIOPCIDevice {
 PCIHostDeviceAddress host;
 EventNotifier err_notifier;
 EventNotifier req_notifier;
+VFIOPCIExtIRQ *ext_irqs;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a49029dfa4..71b411b61c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2864,6 +2864,76 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 .set_pasid_table = vfio_iommu_set_pasid_table,
 };
 
+static void vfio_dma_fault_notifier_handler(void *opaque)
+{
+VFIOPCIExtIRQ *ext_irq = opaque;
+
+if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
+return;
+}
+}
+
+static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
+ uint32_t type, uint32_t subtype,
+ IOHandler *handler)
+{
+int32_t fd, ext_irq_index, index;
+struct vfio_irq_info *irq_info;
+Error *err = NULL;
+EventNotifier *n;
+int ret;
+
+ret = vfio_get_dev_irq_info(&vdev->vbasedev, type, subtype, &irq_info);
+if (ret) {
+return ret;
+}
+index = irq_info->index;
+ext_irq_index = irq_info->index - VFIO_PCI_NUM_IRQS;
+g_free(irq_info);
+
+vdev->ext_irqs[ext_irq_index].vdev = vdev;
+vdev->ext_irqs[ext_irq_index].index = index;
+n = &vdev->ext_irqs[ext_irq_index].notifier;
+
+ret = event_notifier_init(n, 0);
+if (ret) {
+error_report("vfio: Unable to init event notifier for ext irq %d(%d)",
+ ext_irq_index, ret);
+return ret;
+}
+
+fd = event_notifier_get_fd(n);
+qemu_set_fd_handler(fd, vfio_dma_fault_notifier_handler, NULL,
+&vdev->ext_irqs[ext_irq_index]);
+
+ret = vfio_set_irq_signaling(&vdev->vbasedev, index, 0,
+ VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err);
+if (ret) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+qemu_set_fd_handler(fd, NULL, NULL, vdev);
+event_notifier_cleanup(n);
+}
+return ret;
+}
+
+static void vfio_unregister_ext_irq_notifiers(VFIOPCIDevice *vdev)
+{
+VFIODevice *vbasedev = &vdev->vbasedev;
+Error *err = NULL;
+int i;
+
+for (i = 0; i < vbasedev->num_irqs - VFIO_PCI_NUM_IRQS; i++) {
+if (vfio_set_irq_signaling(vbasedev, i + VFIO_PCI_NUM_IRQS , 0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
+qemu_set_fd_handler(event_notifier_get_fd(&vdev->ext_irqs[i].notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->ext_irqs[i].notifier);
+}
+g_free(vdev->ext_irqs);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = VFIO_PCI(pdev);
@@ -2874,7 +2944,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 ssize_t len;
 struct stat st;
 int groupid;
-int i, ret;
+int i, ret, nb_ext_irqs;
 bool is_mdev;
 
 if (!vdev->vbasedev.sysfsdev) {
@@ -2962,6 +3032,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+nb_ext_irqs = vdev->vbasedev.num_irqs - VFIO_PCI_NUM_IRQS;
+if (nb_ext_irqs > 0) {
+vdev->ext_irqs = g_new0(VFIOPCIExtIRQ, nb_ext_irqs);
+}
+
 vfio_populate_device(vdev, &err);
 if (err) {
 error_propagate(errp, err);
@@ -3173,6 +3248,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
+vfio_register_ext_irq_handler(vdev, VFIO_IRQ_TYPE_NESTED,
+  VFIO_IRQ_SUBTYPE_DMA_FAULT,
+

[RFC v9 07/29] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute

2021-04-11 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute,
IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
requires HW nested paging for VFIO integration.

Current Intel virtual IOMMU device supports "Caching
Mode" and does not require 2 stages at physical level to be
integrated with VFIO. However SMMUv3 does not implement such
"caching mode" and requires to use HW nested paging.

As such SMMUv3 is the first IOMMU device to advertise this
attribute.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h |  3 ++-
 hw/arm/smmuv3.c   | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 94b9157249..3af3cc1adb 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -273,7 +273,8 @@ typedef struct MemoryRegionClass {
 
 
 enum IOMMUMemoryRegionAttr {
-IOMMU_ATTR_SPAPR_TCE_FD
+IOMMU_ATTR_SPAPR_TCE_FD,
+IOMMU_ATTR_VFIO_NESTED,
 };
 
 /*
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index a4436868ba..7166008ab0 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1582,6 +1582,17 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 return 0;
 }
 
+static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
+   enum IOMMUMemoryRegionAttr attr,
+   void *data)
+{
+if (attr == IOMMU_ATTR_VFIO_NESTED) {
+*(bool *) data = true;
+return 0;
+}
+return -EINVAL;
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1589,6 +1600,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
+imrc->get_attr = smmuv3_get_attr;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.26.3

[RFC v9 10/29] iommu: Introduce generic header

2021-04-11 Thread Eric Auger

This header is meant to exposes data types used by
several IOMMU devices such as struct for SVA and
nested stage configuration.

Signed-off-by: Eric Auger 
---
 include/hw/iommu/iommu.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 include/hw/iommu/iommu.h

diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
new file mode 100644
index 00..12092bda7b
--- /dev/null
+++ b/include/hw/iommu/iommu.h
@@ -0,0 +1,28 @@
+/*
+ * common header for iommu devices
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HW_IOMMU_IOMMU_H
+#define QEMU_HW_IOMMU_IOMMU_H
+#ifdef __linux__
+#include 
+#endif
+
+typedef struct IOMMUConfig {
+union {
+#ifdef __linux__
+struct iommu_pasid_table_config pasid_cfg;
+#endif
+  };
+} IOMMUConfig;
+
+
+#endif /* QEMU_HW_IOMMU_IOMMU_H */
-- 
2.26.3

[RFC v9 20/29] vfio/pci: Implement the DMA fault handler

2021-04-11 Thread Eric Auger

Whenever the eventfd is triggered, we retrieve the DMA fault(s)
from the mmapped fault region and inject them in the iommu
memory region.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.h |  1 +
 hw/vfio/pci.c | 50 ++
 2 files changed, 51 insertions(+)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index eef91065f1..03ac8919ef 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -146,6 +146,7 @@ struct VFIOPCIDevice {
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
 VFIORegion dma_fault_region;
+uint32_t fault_tail_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9d4e020b97..d7e563859f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2929,10 +2929,60 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 static void vfio_dma_fault_notifier_handler(void *opaque)
 {
 VFIOPCIExtIRQ *ext_irq = opaque;
+VFIOPCIDevice *vdev = ext_irq->vdev;
+PCIDevice *pdev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(pdev);
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(as->root);
+struct vfio_region_dma_fault header;
+struct iommu_fault *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
 
 if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
 return;
 }
+
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->dma_fault_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_fault *)vdev->dma_fault_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes =  pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+   vdev->dma_fault_region.fd_offset + header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+queue = (struct iommu_fault *)queue_buffer;
+}
+
+while (vdev->fault_tail_index != header.head) {
+memory_region_inject_faults(iommu_mr, 1,
+&queue[vdev->fault_tail_index]);
+vdev->fault_tail_index =
+(vdev->fault_tail_index + 1) % header.nb_entries;
+}
+bytes = pwrite(vdev->vbasedev.fd, &vdev->fault_tail_index, 4,
+   vdev->dma_fault_region.fd_offset);
+if (bytes != 4) {
+error_report("%s unable to write the fault region tail index (0x%lx)",
+ __func__, bytes);
+}
+g_free(queue_buffer);
 }
 
 static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
-- 
2.26.3

[RFC v9 08/29] memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute

2021-04-11 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_MSI_TRANSLATE
which tells whether the virtual IOMMU translates MSIs. ARM SMMU
will expose this attribute since, as opposed to Intel DMAR, MSIs
are translated as any other DMA requests.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3af3cc1adb..ac8521b29a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -275,6 +275,7 @@ typedef struct MemoryRegionClass {
 enum IOMMUMemoryRegionAttr {
 IOMMU_ATTR_SPAPR_TCE_FD,
 IOMMU_ATTR_VFIO_NESTED,
+IOMMU_ATTR_MSI_TRANSLATE,
 };
 
 /*
-- 
2.26.3

[RFC v9 13/29] vfio: Introduce hostwin_from_range helper

2021-04-11 Thread Eric Auger

Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if none is found.

This improves the readibility of callers and removes the usage
of hostwin_found.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 30dc45df90..a8f835328e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -649,6 +649,19 @@ out:
 rcu_read_unlock();
 }
 
+static VFIOHostDMAWindow *
+hostwin_from_range(VFIOContainer *container, hwaddr iova, hwaddr end)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+return hostwin;
+}
+}
+return NULL;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -658,7 +671,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 VFIOHostDMAWindow *hostwin;
-bool hostwin_found;
 Error *err = NULL;
 
 if (vfio_listener_skipped_section(section)) {
@@ -744,15 +756,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 #endif
 }
 
-hostwin_found = false;
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-
-if (!hostwin_found) {
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
 error_setg(&err, "Container %p can't map guest IOVA region"
" 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
 goto fail;
@@ -934,16 +939,9 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (memory_region_is_ram_device(section->mr)) {
 hwaddr pgmask;
-VFIOHostDMAWindow *hostwin;
-bool hostwin_found = false;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-assert(hostwin_found); /* or region_add() would have failed */
+assert(hostwin); /* or region_add() would have failed */
 
 pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.26.3

[RFC v9 14/29] vfio: Introduce helpers to DMA map/unmap a RAM section

2021-04-11 Thread Eric Auger

Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 

---

v8 -> v9
- rebase on top of
  1eb7f642750c ("vfio: Support host translation granule size")

v5 -> v6:
- add Error **
---
 hw/vfio/common.c | 199 +--
 hw/vfio/trace-events |   4 +-
 2 files changed, 119 insertions(+), 84 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a8f835328e..0cd7ef2139 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -662,13 +662,126 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section, Error **err)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_setg(err, "Container %p can't map guest IOVA region"
+   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_setg(err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+   "0x%"HWADDR_PRIx", %p) = %d (%m)",
+   container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+error_report_err(*err);
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+if (int128_eq(llsize, int128_2_64())) {
+/* The unmap ioctl doesn't accept a full 64-bit span. */
+llsize = int128_rshift(llsize, 1);
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+iova += int128_get64(llsize);
+}
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *li

[RFC v9 22/29] hw/arm/smmuv3: Store the PASID table GPA in the translation config

2021-04-11 Thread Eric Auger

For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.

Signed-off-by: Eric Auger 
---
 include/hw/arm/smmu-common.h | 1 +
 hw/arm/smmuv3.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 706be3c6d0..d578339935 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -76,6 +76,7 @@ typedef struct SMMUTransCfg {
 uint8_t tbi;   /* Top Byte Ignore */
 uint16_t asid;
 SMMUTransTableInfo tt[2];
+dma_addr_t s1ctxptr;
 uint32_t iotlb_hits;   /* counts IOTLB hits for this asid */
 uint32_t iotlb_misses; /* counts IOTLB misses for this asid */
 } SMMUTransCfg;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 1ee81a25e9..a7608af5dd 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -358,6 +358,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   "SMMUv3 S1 stalling fault model not allowed yet\n");
 goto bad_ste;
 }
+cfg->s1ctxptr = STE_CTXPTR(ste);
 return 0;
 
 bad_ste:
-- 
2.26.3

[RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-11 Thread Eric Auger

In nested mode, legacy vfio_iommu_map_notify cannot be used as
there is no "caching" mode and we do not trap on map.

On Intel, vfio_iommu_map_notify was used to DMA map the RAM
through the host single stage.

With nested mode, we need to setup the stage 2 and the stage 1
separately. This patch introduces a prereg_listener to setup
the stage 2 mapping.

The stage 1 mapping, owned by the guest, is passed to the host
when the guest invalidates the stage 1 configuration, through
a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
are cascaded downto the host through another IOMMU MR UNMAP
notifier.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- properly handle new IOMMUTLBEntry fields and especially
  propagate DOMAIN and PASID based invalidations

v6 -> v7:
- remove PASID based invalidation

v5 -> v6:
- add error_report_err()
- remove the abort in case of nested stage case

v4 -> v5:
- use VFIO_IOMMU_SET_PASID_TABLE
- use PCIPASIDOps for config notification

v3 -> v4:
- use iommu_inv_pasid_info for ASID invalidation

v2 -> v3:
- use VFIO_IOMMU_ATTACH_PASID_TABLE
- new user API
- handle leaf

v1 -> v2:
- adapt to uapi changes
- pass the asid
- pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
---
 hw/vfio/common.c | 139 +--
 hw/vfio/pci.c|  21 +++
 hw/vfio/trace-events |   2 +
 3 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0cd7ef2139..e369d451e7 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return true;
 }
 
+/* Propagate a guest IOTLB invalidation to the host (nested mode) */
+static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+struct vfio_iommu_type1_cache_invalidate ustruct = {};
+VFIOContainer *container = giommu->container;
+int ret;
+
+assert(iotlb->perm == IOMMU_NONE);
+
+ustruct.argsz = sizeof(ustruct);
+ustruct.flags = 0;
+ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
+ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+
+switch (iotlb->granularity) {
+case IOMMU_INV_GRAN_DOMAIN:
+ustruct.info.granularity = IOMMU_INV_GRANU_DOMAIN;
+break;
+case IOMMU_INV_GRAN_PASID:
+{
+struct iommu_inv_pasid_info *pasid_info;
+int archid = -1;
+
+pasid_info = &ustruct.info.granu.pasid_info;
+ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
+if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
+pasid_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
+archid = iotlb->arch_id;
+}
+pasid_info->archid = archid;
+trace_vfio_iommu_asid_inv_iotlb(archid);
+break;
+}
+case IOMMU_INV_GRAN_ADDR:
+{
+hwaddr start = iotlb->iova + giommu->iommu_offset;
+struct iommu_inv_addr_info *addr_info;
+size_t size = iotlb->addr_mask + 1;
+int archid = -1;
+
+addr_info = &ustruct.info.granu.addr_info;
+ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
+if (iotlb->leaf) {
+addr_info->flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
+}
+if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
+addr_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
+archid = iotlb->arch_id;
+}
+addr_info->archid = archid;
+addr_info->addr = start;
+addr_info->granule_size = size;
+addr_info->nb_granules = 1;
+trace_vfio_iommu_addr_inv_iotlb(archid, start, size,
+1, iotlb->leaf);
+break;
+}
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, &ustruct);
+if (ret) {
+error_report("%p: failed to invalidate CACHE (%d)", container, ret);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -776,6 +843,35 @@ static void vfio_dma_unmap_ram_section(VFIOContainer 
*container,
 }
 }
 
+static void vfio_prereg_listener_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+Error *err = NULL;
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_map_ram_section(container, section, &err);
+if (err) {
+error_report_err(err);
+}
+}
+static void vfio_prereg_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region

[RFC v9 12/29] vfio: Force nested if iommu requires it

2021-04-11 Thread Eric Auger

In case we detect the address space is translated by
a virtual IOMMU which requires HW nested paging to
integrate with VFIO, let's set up the container with
the VFIO_TYPE1_NESTING_IOMMU iommu_type.

Signed-off-by: Eric Auger 

---

v7 -> v8
- remove as != &address_space_memory as
  memory_region_is_iommu(as->root) is sufficient [Kunkun]

v4 -> v5:
- fail immediatly if nested is wanted but not supported

v2 -> v3:
- add "nested only is selected if requested by @force_nested"
  comment in this patch
---
 hw/vfio/common.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a456455517..30dc45df90 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1579,27 +1579,38 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
  * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
  */
 static int vfio_get_iommu_type(VFIOContainer *container,
+   bool want_nested,
Error **errp)
 {
-int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
+int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU,
+  VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
   VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU };
-int i;
+int i, ret = -EINVAL;
 
 for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
 if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
-return iommu_types[i];
+if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU && !want_nested) {
+continue;
+}
+ret = iommu_types[i];
+break;
 }
 }
-error_setg(errp, "No available IOMMU models");
-return -EINVAL;
+if (ret < 0) {
+error_setg(errp, "No available IOMMU models");
+} else if (want_nested && ret != VFIO_TYPE1_NESTING_IOMMU) {
+error_setg(errp, "Nested mode requested but not supported");
+ret = -EINVAL;
+}
+return ret;
 }
 
 static int vfio_init_container(VFIOContainer *container, int group_fd,
-   Error **errp)
+   bool want_nested, Error **errp)
 {
 int iommu_type, ret;
 
-iommu_type = vfio_get_iommu_type(container, errp);
+iommu_type = vfio_get_iommu_type(container, want_nested, errp);
 if (iommu_type < 0) {
 return iommu_type;
 }
@@ -1704,6 +1715,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 VFIOContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
+IOMMUMemoryRegion *iommu_mr;
+bool nested = false;
+
+if (memory_region_is_iommu(as->root)) {
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+}
 
 space = vfio_get_address_space(as);
 
@@ -1773,13 +1792,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 
-ret = vfio_init_container(container, group->fd, errp);
+ret = vfio_init_container(container, group->fd, nested, errp);
 if (ret) {
 goto free_container_exit;
 }
 trace_vfio_connect_new_container(group->groupid, container->fd);
 
 switch (container->iommu_type) {
+case VFIO_TYPE1_NESTING_IOMMU:
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-- 
2.26.3

[RFC v9 16/29] vfio: Pass stage 1 MSI bindings to the host

2021-04-11 Thread Eric Auger

We register the stage1 MSI bindings when enabling the vectors
and we unregister them on msi disable.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- add unregistration on msix_diable
- remove vfio_container_unbind_msis()

v4 -> v5:
- use VFIO_IOMMU_SET_MSI_BINDING

v2 -> v3:
- only register the notifier if the IOMMU translates MSIs
- record the msi bindings in a container list and unregister on
  container release
---
 include/hw/vfio/vfio-common.h | 12 ++
 hw/vfio/common.c  | 59 +++
 hw/vfio/pci.c | 76 ++-
 hw/vfio/trace-events  |  2 +
 4 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6141162d7a..f30133b2a3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -74,6 +74,14 @@ typedef struct VFIOAddressSpace {
 QLIST_ENTRY(VFIOAddressSpace) list;
 } VFIOAddressSpace;
 
+typedef struct VFIOMSIBinding {
+int index;
+hwaddr iova;
+hwaddr gpa;
+hwaddr size;
+QLIST_ENTRY(VFIOMSIBinding) next;
+} VFIOMSIBinding;
+
 struct VFIOGroup;
 
 typedef struct VFIOContainer {
@@ -91,6 +99,7 @@ typedef struct VFIOContainer {
 QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 QLIST_HEAD(, VFIOGroup) group_list;
+QLIST_HEAD(, VFIOMSIBinding) msibinding_list;
 QLIST_ENTRY(VFIOContainer) next;
 } VFIOContainer;
 
@@ -200,6 +209,9 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, 
Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp);
+int vfio_iommu_set_msi_binding(VFIOContainer *container, int n,
+   IOMMUTLBEntry *entry);
+int vfio_iommu_unset_msi_binding(VFIOContainer *container, int n);
 
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e369d451e7..970a5a7be7 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -662,6 +662,65 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 }
 
+int vfio_iommu_set_msi_binding(VFIOContainer *container, int n,
+   IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding;
+int ret;
+
+QLIST_FOREACH(binding, &container->msibinding_list, next) {
+if (binding->index == n) {
+return 0;
+}
+}
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.iova = iotlb->iova;
+ustruct.flags = VFIO_IOMMU_BIND_MSI;
+ustruct.gpa = iotlb->translated_addr;
+ustruct.size = iotlb->addr_mask + 1;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+if (ret) {
+error_report("%s: failed to register the stage1 MSI binding (%m)",
+ __func__);
+return ret;
+}
+binding =  g_new0(VFIOMSIBinding, 1);
+binding->iova = ustruct.iova;
+binding->gpa = ustruct.gpa;
+binding->size = ustruct.size;
+binding->index = n;
+
+QLIST_INSERT_HEAD(&container->msibinding_list, binding, next);
+return 0;
+}
+
+int vfio_iommu_unset_msi_binding(VFIOContainer *container, int n)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding, *tmp;
+int ret;
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+QLIST_FOREACH_SAFE(binding, &container->msibinding_list, next, tmp) {
+if (binding->index != n) {
+continue;
+}
+ustruct.flags = VFIO_IOMMU_UNBIND_MSI;
+ustruct.iova = binding->iova;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+if (ret) {
+error_report("Failed to unregister the stage1 MSI binding "
+ "for iova=0x%"PRIx64" (%m)", binding->iova);
+}
+QLIST_REMOVE(binding, next);
+g_free(binding);
+return ret;
+}
+return 0;
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cad7deec71..a49029dfa4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -366,6 +366,65 @@ static void vfio_msi_interrupt(void *opaque)
 notify(&vdev->pdev, nr);
 }
 
+static bool vfio_iommu_require_msi_binding(IOMMUMemoryRegion *iommu_mr)
+{
+bool msi_translate = false, nested = false;
+
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)&msi_translate);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+if (!nested || !msi_translate) {
+

[RFC v9 23/29] hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation

2021-04-11 Thread Eric Auger

When the guest invalidates one S1 entry, it passes the asid.
When propagating this invalidation downto the host, the asid
information also must be passed. So let's fill the arch_id field
introduced for that purpose and accordingly set the flags to
indicate its presence.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- set flags
---
 hw/arm/smmuv3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index a7608af5dd..7beb55cd89 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -832,6 +832,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 event.entry.iova = iova;
 event.entry.addr_mask = num_pages * (1 << granule) - 1;
 event.entry.perm = IOMMU_NONE;
+event.entry.flags = IOMMU_INV_FLAGS_ARCHID;
+event.entry.arch_id = asid;
 
 memory_region_notify_iommu_one(n, &event);
 }
-- 
2.26.3

[RFC v9 17/29] vfio: Helper to get IRQ info including capabilities

2021-04-11 Thread Eric Auger

As done for vfio regions, add helpers to retrieve irq info
including their optional capabilities.

Signed-off-by: Eric Auger 
---
 include/hw/vfio/vfio-common.h |  7 +++
 hw/vfio/common.c  | 97 +++
 hw/vfio/trace-events  |  1 +
 3 files changed, 105 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f30133b2a3..fcbda2d071 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -232,6 +232,13 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info 
*info,
  unsigned int *avail);
 struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
+int vfio_get_irq_info(VFIODevice *vbasedev, int index,
+  struct vfio_irq_info **info);
+int vfio_get_dev_irq_info(VFIODevice *vbasedev, uint32_t type,
+  uint32_t subtype, struct vfio_irq_info **info);
+bool vfio_has_irq_cap(VFIODevice *vbasedev, int irq, uint16_t cap_type);
+struct vfio_info_cap_header *
+vfio_get_irq_info_cap(struct vfio_irq_info *info, uint16_t id);
 #endif
 extern const MemoryListener vfio_prereg_listener;
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 970a5a7be7..dc8372c772 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1452,6 +1452,25 @@ bool vfio_get_info_dma_avail(struct 
vfio_iommu_type1_info *info,
 return true;
 }
 
+struct vfio_info_cap_header *
+vfio_get_irq_info_cap(struct vfio_irq_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IRQ_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {
@@ -2385,6 +2404,33 @@ retry:
 return 0;
 }
 
+int vfio_get_irq_info(VFIODevice *vbasedev, int index,
+  struct vfio_irq_info **info)
+{
+size_t argsz = sizeof(struct vfio_irq_info);
+
+*info = g_malloc0(argsz);
+
+(*info)->index = index;
+retry:
+(*info)->argsz = argsz;
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if ((*info)->argsz > argsz) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+
+goto retry;
+}
+
+return 0;
+}
+
 int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
  uint32_t subtype, struct vfio_region_info **info)
 {
@@ -2420,6 +2466,42 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 return -ENODEV;
 }
 
+int vfio_get_dev_irq_info(VFIODevice *vbasedev, uint32_t type,
+  uint32_t subtype, struct vfio_irq_info **info)
+{
+int i;
+
+for (i = 0; i < vbasedev->num_irqs; i++) {
+struct vfio_info_cap_header *hdr;
+struct vfio_irq_info_cap_type *cap_type;
+
+if (vfio_get_irq_info(vbasedev, i, info)) {
+continue;
+}
+
+hdr = vfio_get_irq_info_cap(*info, VFIO_IRQ_INFO_CAP_TYPE);
+if (!hdr) {
+g_free(*info);
+continue;
+}
+
+cap_type = container_of(hdr, struct vfio_irq_info_cap_type, header);
+
+trace_vfio_get_dev_irq(vbasedev->name, i,
+   cap_type->type, cap_type->subtype);
+
+if (cap_type->type == type && cap_type->subtype == subtype) {
+return 0;
+}
+
+g_free(*info);
+}
+
+*info = NULL;
+return -ENODEV;
+}
+
+
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 {
 struct vfio_region_info *info = NULL;
@@ -2435,6 +2517,21 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int 
region, uint16_t cap_type)
 return ret;
 }
 
+bool vfio_has_irq_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
+{
+struct vfio_region_info *info = NULL;
+bool ret = false;
+
+if (!vfio_get_region_info(vbasedev, region, &info)) {
+if (vfio_get_region_info_cap(info, cap_type)) {
+ret = true;
+}
+g_free(info);
+}
+
+return ret;
+}
+
 /*
  * Interfaces for IBM EEH (Enhanced Error Handling)
  */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 5c1b28d0d4..1d87c40c1b 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,7 @@ vfio_region_unmap(const char *name, unsigned long offset, 
unsigned long end) "Re
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_d

[RFC v9 19/29] vfio/pci: Set up the DMA FAULT region

2021-04-11 Thread Eric Auger

Set up the fault region which is composed of the actual fault
queue (mmappable) and a header used to handle it. The fault
queue is mmapped.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use a single DMA FAULT region. No version selection anymore
---
 hw/vfio/pci.h |  1 +
 hw/vfio/pci.c | 64 +++
 2 files changed, 65 insertions(+)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a8b06737fb..eef91065f1 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -145,6 +145,7 @@ struct VFIOPCIDevice {
 EventNotifier err_notifier;
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
+VFIORegion dma_fault_region;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 71b411b61c..9d4e020b97 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2614,11 +2614,67 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 return 0;
 }
 
+static void vfio_init_fault_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = &vdev->vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name;
+int ret;
+
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
+   &fault_region_info);
+if (ret) {
+goto out;
+}
+
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_DMA_FAULT);
+if (!hdr) {
+error_setg(errp, "failed to retrieve DMA FAULT capability");
+goto out;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+if (cap_fault->version != 1) {
+error_setg(errp, "Unsupported DMA FAULT API version %d",
+   cap_fault->version);
+goto out;
+}
+
+fault_region_name = g_strdup_printf("%s DMA FAULT %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->dma_fault_region,
+fault_region_info->index,
+fault_region_name);
+g_free(fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to set up the DMA FAULT region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(&vdev->dma_fault_region);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT queue");
+}
+out:
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = &vdev->vbasedev;
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+Error *err = NULL;
 int i, ret = -1;
 
 /* Sanity check device */
@@ -2682,6 +2738,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 }
 
+vfio_init_fault_regions(vdev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
@@ -3274,6 +3336,7 @@ static void vfio_instance_finalize(Object *obj)
 
 vfio_display_finalize(vdev);
 vfio_bars_finalize(vdev);
+vfio_region_finalize(&vdev->dma_fault_region);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
 /*
@@ -3294,6 +3357,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 vfio_unregister_ext_irq_notifiers(vdev);
+vfio_region_exit(&vdev->dma_fault_region);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
 if (vdev->irqchip_change_notifier.notify) {
 kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
-- 
2.26.3

[RFC v9 27/29] hw/arm/smmuv3: Allow MAP notifiers

2021-04-11 Thread Eric Auger

We now have all bricks to support nested paging. This
uses MAP notifiers to map the MSIs. So let's allow MAP
notifiers to be registered.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 53b71c895c..ca690513e6 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1620,14 +1620,6 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 return -EINVAL;
 }
 
-if (new & IOMMU_NOTIFIER_MAP) {
-error_setg(errp,
-   "device %02x.%02x.%x requires iommu MAP notifier which is "
-   "not currently supported", pci_bus_num(sdev->bus),
-   PCI_SLOT(sdev->devfn), PCI_FUNC(sdev->devfn));
-return -EINVAL;
-}
-
 if (old == IOMMU_NOTIFIER_NONE) {
 trace_smmuv3_notify_flag_add(iommu->parent_obj.name);
 QLIST_INSERT_HEAD(&s->devices_with_notifiers, sdev, next);
-- 
2.26.3

[RFC v9 26/29] hw/arm/smmuv3: Implement fault injection

2021-04-11 Thread Eric Auger

We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- fix compil issue on mingw

Exhaustive mapping remains to be done
---
 hw/arm/smmuv3.c | 71 +
 1 file changed, 71 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index aefc55a607..53b71c895c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1652,6 +1652,76 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 return -EINVAL;
 }
 
+struct iommu_fault;
+
+static inline int
+smmuv3_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+ struct iommu_fault *buf)
+{
+#ifdef __linux__
+SMMUDevice *sdev = container_of(iommu_mr, SMMUDevice, iommu);
+SMMUv3State *s3 = sdev->smmu;
+uint32_t sid = smmu_get_sid(sdev);
+int i;
+
+for (i = 0; i < count; i++) {
+SMMUEventInfo info = {};
+struct iommu_fault_unrecoverable *record;
+
+if (buf[i].type != IOMMU_FAULT_DMA_UNRECOV) {
+continue;
+}
+
+info.sid = sid;
+record = &buf[i].event;
+
+switch (record->reason) {
+case IOMMU_FAULT_REASON_PASID_INVALID:
+info.type = SMMU_EVT_C_BAD_SUBSTREAMID;
+/* TODO further fill info.u.c_bad_substream */
+break;
+case IOMMU_FAULT_REASON_PASID_FETCH:
+info.type = SMMU_EVT_F_CD_FETCH;
+break;
+case IOMMU_FAULT_REASON_BAD_PASID_ENTRY:
+info.type = SMMU_EVT_C_BAD_CD;
+/* TODO further fill info.u.c_bad_cd */
+break;
+case IOMMU_FAULT_REASON_WALK_EABT:
+info.type = SMMU_EVT_F_WALK_EABT;
+info.u.f_walk_eabt.addr = record->addr;
+info.u.f_walk_eabt.addr2 = record->fetch_addr;
+break;
+case IOMMU_FAULT_REASON_PTE_FETCH:
+info.type = SMMU_EVT_F_TRANSLATION;
+info.u.f_translation.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_OOR_ADDRESS:
+info.type = SMMU_EVT_F_ADDR_SIZE;
+info.u.f_addr_size.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_ACCESS:
+info.type = SMMU_EVT_F_ACCESS;
+info.u.f_access.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_PERMISSION:
+info.type = SMMU_EVT_F_PERMISSION;
+info.u.f_permission.addr = record->addr;
+break;
+default:
+warn_report("%s Unexpected fault reason received from host: %d",
+__func__, record->reason);
+continue;
+}
+
+smmuv3_record_event(s3, &info);
+}
+return 0;
+#else
+return -1;
+#endif
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1660,6 +1730,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 imrc->get_attr = smmuv3_get_attr;
+imrc->inject_faults = smmuv3_inject_faults;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.26.3

[RFC v9 21/29] hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute

2021-04-11 Thread Eric Auger

The SMMUv3 has the peculiarity to translate MSI
transactionss. let's advertise the corresponding
attribute.

Signed-off-by: Eric Auger 

---
---
 hw/arm/smmuv3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 7166008ab0..1ee81a25e9 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1589,6 +1589,9 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 if (attr == IOMMU_ATTR_VFIO_NESTED) {
 *(bool *) data = true;
 return 0;
+} else if (attr == IOMMU_ATTR_MSI_TRANSLATE) {
+*(bool *) data = true;
+return 0;
 }
 return -EINVAL;
 }
-- 
2.26.3

[RFC v9 29/29] vfio/pci: Implement return_page_response page response callback

2021-04-11 Thread Eric Auger

This patch implements the page response path. The
response is written into the page response ring buffer and then
update header's head index is updated. This path is not used
by this series. It is introduced here as a POC for vSVA/ARM
integration.

Signed-off-by: Eric Auger 

---

v11 -> v12:
- use VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE [Shameer]
- fix hot del regression reported and fixed by Shameer
---
 hw/vfio/pci.h |   2 +
 hw/vfio/pci.c | 123 ++
 2 files changed, 125 insertions(+)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 03ac8919ef..61b3bf1303 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -147,6 +147,8 @@ struct VFIOPCIDevice {
 VFIOPCIExtIRQ *ext_irqs;
 VFIORegion dma_fault_region;
 uint32_t fault_tail_index;
+VFIORegion dma_fault_response_region;
+uint32_t fault_response_head_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d7e563859f..0f23c8f343 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2669,6 +2669,61 @@ out:
 g_free(fault_region_info);
 }
 
+static void vfio_init_fault_response_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = &vdev->vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name;
+int ret;
+
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   
VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT_RESPONSE,
+   &fault_region_info);
+if (ret) {
+goto out;
+}
+
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE);
+if (!hdr) {
+error_setg(errp, "failed to retrieve DMA FAULT RESPONSE capability");
+goto out;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+if (cap_fault->version != 1) {
+error_setg(errp, "Unsupported DMA FAULT RESPONSE API version %d",
+   cap_fault->version);
+goto out;
+}
+
+fault_region_name = g_strdup_printf("%s DMA FAULT RESPONSE %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->dma_fault_response_region,
+fault_region_info->index,
+fault_region_name);
+g_free(fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to set up the DMA FAULT RESPONSE region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(&vdev->dma_fault_response_region);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT RESPONSE 
queue");
+}
+out:
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = &vdev->vbasedev;
@@ -2744,6 +2799,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 return;
 }
 
+vfio_init_fault_response_regions(vdev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
@@ -2922,8 +2983,68 @@ static int vfio_iommu_set_pasid_table(PCIBus *bus, 
int32_t devfn,
 return ioctl(container->fd, VFIO_IOMMU_SET_PASID_TABLE, &info);
 }
 
+static int vfio_iommu_return_page_response(PCIBus *bus, int32_t devfn,
+   IOMMUPageResponse *resp)
+{
+PCIDevice *pdev = bus->devices[devfn];
+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+struct iommu_page_response *response = &resp->resp;
+struct vfio_region_dma_fault_response header;
+struct iommu_page_response *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
+
+if (!vdev->dma_fault_response_region.mem) {
+return -EINVAL;
+}
+
+/* read the header */
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->dma_fault_response_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return -1;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_page_response 
*)vdev->dma_fault_response_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mma

[RFC v9 24/29] hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation

2021-04-11 Thread Eric Auger

Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 7beb55cd89..74a6408146 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -799,7 +799,7 @@ epilogue:
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
int asid, dma_addr_t iova,
-   uint8_t tg, uint64_t num_pages)
+   uint8_t tg, uint64_t num_pages, bool leaf)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 IOMMUTLBEvent event = {};
@@ -834,6 +834,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 event.entry.perm = IOMMU_NONE;
 event.entry.flags = IOMMU_INV_FLAGS_ARCHID;
 event.entry.arch_id = asid;
+event.entry.leaf = leaf;
 
 memory_region_notify_iommu_one(n, &event);
 }
@@ -865,7 +866,7 @@ static void smmuv3_notify_asid(IOMMUMemoryRegion *mr,
 
 /* invalidate an asid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova,
-  uint8_t tg, uint64_t num_pages)
+  uint8_t tg, uint64_t num_pages, bool 
leaf)
 {
 SMMUDevice *sdev;
 
@@ -877,7 +878,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova,
 tg, num_pages);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, iova, tg, num_pages);
+smmuv3_notify_iova(mr, n, asid, iova, tg, num_pages, leaf);
 }
 }
 }
@@ -915,7 +916,7 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 count = mask + 1;
 
 trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, count, ttl, leaf);
-smmuv3_inv_notifiers_iova(s, asid, addr, tg, count);
+smmuv3_inv_notifiers_iova(s, asid, addr, tg, count, leaf);
 smmu_iotlb_inv_iova(s, asid, addr, tg, count, ttl);
 
 num_pages -= count;
-- 
2.26.3

[RFC v9 25/29] hw/arm/smmuv3: Pass stage 1 configurations to the host

2021-04-11 Thread Eric Auger

In case PASID PciOps are set for the device we call
the set_pasid_table() callback on each STE update.

This allows to pass the guest stage 1 configuration
to the host and apply it at physical level.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- Use PciOps instead of config notifiers

v3 -> v4:
- fix compile issue with mingw

v2 -> v3:
- adapt to pasid_cfg field changes. Use local variable
- add trace event
- set version fields
- use CONFIG_PASID

v1 -> v2:
- do not notify anymore on CD change. Anyway the smmuv3 linux
  driver is not sending any CD invalidation commands. If we were
  to propagate CD invalidation commands, we would use the
  CACHE_INVALIDATE VFIO ioctl.
- notify a precise config flags to prepare for addition of new
  flags
---
 hw/arm/smmu-internal.h |  1 +
 hw/arm/smmuv3.c| 72 --
 hw/arm/trace-events|  1 +
 3 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
index 2d75b31953..5ef8c598c6 100644
--- a/hw/arm/smmu-internal.h
+++ b/hw/arm/smmu-internal.h
@@ -105,6 +105,7 @@ typedef struct SMMUIOTLBPageInvInfo {
 } SMMUIOTLBPageInvInfo;
 
 typedef struct SMMUSIDRange {
+SMMUState *state;
 uint32_t start;
 uint32_t end;
 } SMMUSIDRange;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 74a6408146..aefc55a607 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -16,6 +16,10 @@
  * with this program; if not, see .
  */
 
+#ifdef __linux__
+#include "linux/iommu.h"
+#endif
+
 #include "qemu/osdep.h"
 #include "qemu/bitops.h"
 #include "hw/irq.h"
@@ -925,6 +929,61 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
 }
 }
 
+static void smmuv3_notify_config_change(SMMUState *bs, uint32_t sid)
+{
+#ifdef __linux__
+IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
+SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+   .inval_ste_allowed = true};
+IOMMUConfig iommu_config = {};
+SMMUTransCfg *cfg;
+SMMUDevice *sdev;
+
+if (!mr) {
+return;
+}
+
+sdev = container_of(mr, SMMUDevice, iommu);
+
+/* flush QEMU config cache */
+smmuv3_flush_config(sdev);
+
+if (!pci_device_is_pasid_ops_set(sdev->bus, sdev->devfn)) {
+return;
+}
+
+cfg = smmuv3_get_config(sdev, &event);
+
+if (!cfg) {
+return;
+}
+
+iommu_config.pasid_cfg.argsz = sizeof(struct iommu_pasid_table_config);
+iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
+iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
+iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
+iommu_config.pasid_cfg.pasid_bits = 0;
+iommu_config.pasid_cfg.vendor_data.smmuv3.version = 
PASID_TABLE_SMMUV3_CFG_VERSION_1;
+
+if (cfg->disabled || cfg->bypassed) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
+} else if (cfg->aborted) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
+} else {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
+}
+
+trace_smmuv3_notify_config_change(mr->parent_obj.name,
+  iommu_config.pasid_cfg.config,
+  iommu_config.pasid_cfg.base_ptr);
+
+if (pci_device_set_pasid_table(sdev->bus, sdev->devfn, &iommu_config)) {
+error_report("Failed to pass PASID table to host for iommu mr %s (%m)",
+ mr->parent_obj.name);
+}
+#endif
+}
+
 static gboolean
 smmuv3_invalidate_ste(gpointer key, gpointer value, gpointer user_data)
 {
@@ -935,6 +994,7 @@ smmuv3_invalidate_ste(gpointer key, gpointer value, 
gpointer user_data)
 if (sid < sid_range->start || sid > sid_range->end) {
 return false;
 }
+smmuv3_notify_config_change(sid_range->state, sid);
 trace_smmuv3_config_cache_inv(sid);
 return true;
 }
@@ -1005,22 +1065,14 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_STE:
 {
 uint32_t sid = CMD_SID(&cmd);
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
 
 if (CMD_SSEC(&cmd)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
 trace_smmuv3_cmdq_cfgi_ste(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-
+smmuv3_notify_config_change(bs, sid);
 break;
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -1028,7 +1080,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 uint32_t start = CMD_SID(&cmd);
 uint8_t range = CMD_STE_RANGE(&cmd);
 uint64_t end = start + (1ULL << (range + 1)) - 1;
-SMMUSIDRange sid_range = {start, end};
+SMMUSIDRange sid_range = {bs

[RFC v9 28/29] pci: Add return_page_response pci ops

2021-04-11 Thread Eric Auger

Add a new PCI operation that allows to return page responses
to registered VFIO devices

Signed-off-by: Eric Auger 
---
 include/hw/iommu/iommu.h |  8 
 include/hw/pci/pci.h |  4 
 hw/pci/pci.c | 16 
 3 files changed, 28 insertions(+)

diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
index 12092bda7b..5890f095b1 100644
--- a/include/hw/iommu/iommu.h
+++ b/include/hw/iommu/iommu.h
@@ -24,5 +24,13 @@ typedef struct IOMMUConfig {
   };
 } IOMMUConfig;
 
+typedef struct IOMMUPageResponse {
+union {
+#ifdef __linux__
+struct iommu_page_response resp;
+#endif
+  };
+} IOMMUPageResponse;
+
 
 #endif /* QEMU_HW_IOMMU_IOMMU_H */
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 1f73c04975..9bc0919352 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -268,6 +268,8 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 
 struct PCIPASIDOps {
 int (*set_pasid_table)(PCIBus *bus, int32_t devfn, IOMMUConfig *config);
+int (*return_page_response)(PCIBus *bus, int32_t devfn,
+IOMMUPageResponse *resp);
 };
 typedef struct PCIPASIDOps PCIPASIDOps;
 
@@ -501,6 +503,8 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void 
*opaque);
 void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
 bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn);
 int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn, IOMMUConfig 
*config);
+int pci_device_return_page_response(PCIBus *bus, int32_t devfn,
+IOMMUPageResponse *resp);
 
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 114855a0ac..18d84ff42e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2765,6 +2765,22 @@ int pci_device_set_pasid_table(PCIBus *bus, int32_t 
devfn,
 return -ENOENT;
 }
 
+int pci_device_return_page_response(PCIBus *bus, int32_t devfn,
+IOMMUPageResponse *resp)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return -EINVAL;
+}
+
+dev = bus->devices[devfn];
+if (dev && dev->pasid_ops && dev->pasid_ops->return_page_response) {
+return dev->pasid_ops->return_page_response(bus, devfn, resp);
+}
+return -ENOENT;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
 Range *range = opaque;
-- 
2.26.3

Better alternative to strncpy in QEMU.

2021-04-11 Thread Chetan

Hello All,

This mail is in reference to one of the tasks mentioned in '
*Contribute/BiteSizedTasks*' in QEMU wiki, under '*API conversion*' which
states to introduce a better alternative to strncpy function. I've drafted
and tested below implementation for the same. Before proceeding with any
changes in QEMU code can you all please go through it and suggest
changes/corrections if required.




































































*/* This function is introduced in place of strncpy(), it asserts if
destination * is large enough to fit strlen(source)+1 bytes and guarantees
null termination * in destination string. * * char source[], is expecting a
pointer to the source where data should be copied * from. * * char
destination[], is expecting a pointer to the destination where data
should * be copied to. * * size_t destination_size, is expecting size of
destination. * In case of char[], sizeof() function can be used to find the
size. * In case of char *, provide value which was passed to malloc()
function for * memory allocation. */char *qemu_strncpy(char destination[],
char source[], size_t destination_size){/* Looping through the array
and copying the characters from * source to destination. */for
(int i = 0; i < strlen(source); i++) {destination[i] = source[i];
  /* Check if value of i is equal to the second last index * of
destination array and if condition is true, mark last * index as
NULL and break from the loop. */if (i == (destination_size
- 2)) {destination[destination_size - 1] = '\0';
break;}}return destination;}/* This function is introduced
in place of strncpy(), it asserts if destination * is large enough to fit
strlen(source) bytes and does not guarantee null * termination in
destination string. * * char source[], is expecting a pointer to the source
where data should be copied * from. * * char destination[], is expecting a
pointer to the destination where data should * be copied to. * * size_t
destination_size, is expecting size of destination. * In case of char[],
sizeof() function can be used to find the size. * In case of char *,
provide value which was passed to malloc() function for * memory
allocation. */char *qemu_strncpy_nonul(char destination[], char source[],
size_t destination_size){/* Looping through the array and copying the
characters from * source to destination. */for (int i = 0; i <
strlen(source); i++) {destination[i] = source[i];/* Check
if value of i is equal to the last index * of the destination array
and if condition is true, * break from the loop. */
if (i == (destination_size - 1)) {break;}}
return destination;} *

Regards,
Chetan P.

Re: [PATCH 1/1] Set TARGET_PAGE_BITS to be 10 instead of 8 bits

2021-04-11 Thread Richard Henderson

On 4/10/21 10:24 AM, Michael Rolnik wrote:

Please review.

The first 256b is i/o, the next 768b are ram.  But having changed the page 
size, it should mean that the first 1k are now treated as i/o.

We do have a path by which instructions in i/o pages can be executed.  This 
happens on some ARM board setups during cold boot.  But we do not save those 
translations, so they run much much slower than it should.

But perhaps in the case of AVR, "much much slower" really isn't visible?

In general, I think changing the page size is wrong.  I also assume that 
migration is largely irrelevant to this target.

r~

On Tue, Mar 23, 2021 at 10:28 PM Michael Rolnik > wrote:

If I set TARGET_PAGE_BITS to 12 this *assert assert(v_l2_levels >= 0);*
will fail (page_table_config_init function) because
TARGET_PHYS_ADDR_SPACE_BITS is 24 bits, because AVR has 24 is the longest
pointer AVR has. I can set TARGET_PHYS_ADDR_SPACE_BITS to 32 and
TARGET_PAGE_BITS to 12 and everything will work fine.
What do you think?

btw, wrote the original comment, you David referred to, when I did not know
that QEMU could map several regions to the same page, which is not true.
That's why I could change 8 to 10.

On Tue, Mar 23, 2021 at 10:11 PM Michael Rolnik mailto:mrol...@gmail.com>> wrote:

how long?

On Tue, Mar 23, 2021 at 2:46 PM Dr. David Alan Gilbert
mailto:dgilb...@redhat.com>> wrote:

* Michael Rolnik (mrol...@gmail.com ) 
wrote:
 > Signed-off-by: Michael Rolnik mailto:mrol...@gmail.com>>
 > ---
 >  target/avr/cpu-param.h | 8 +---
 >  target/avr/helper.c    | 2 --
 >  2 files changed, 1 insertion(+), 9 deletions(-)
 >
 > diff --git a/target/avr/cpu-param.h b/target/avr/cpu-param.h
 > index 7ef4e7c679..9765a9d0db 100644
 > --- a/target/avr/cpu-param.h
 > +++ b/target/avr/cpu-param.h
 > @@ -22,13 +22,7 @@
 >  #define AVR_CPU_PARAM_H
 >
 >  #define TARGET_LONG_BITS 32
 > -/*
 > - * TARGET_PAGE_BITS cannot be more than 8 bits because
 > - * 1.  all IO registers occupy [0x .. 0x00ff] address
range, and they
 > - *     should be implemented as a device and not memory
 > - * 2.  SRAM starts at the address 0x0100

I don't know AVR; but that seems to say why you can't make it any
larger
- how do you solve that?

Dave

 > -#define TARGET_PAGE_BITS 8
 > +#define TARGET_PAGE_BITS 10
 >  #define TARGET_PHYS_ADDR_SPACE_BITS 24
 >  #define TARGET_VIRT_ADDR_SPACE_BITS 24
 >  #define NB_MMU_MODES 2
 > diff --git a/target/avr/helper.c b/target/avr/helper.c
 > index 35e1019594..da658afed3 100644
 > --- a/target/avr/helper.c
 > +++ b/target/avr/helper.c
 > @@ -111,8 +111,6 @@ bool avr_cpu_tlb_fill(CPUState *cs, vaddr
address, int size,
 >      MemTxAttrs attrs = {};
 >      uint32_t paddr;
 >
 > -    address &= TARGET_PAGE_MASK;
 > -
 >      if (mmu_idx == MMU_CODE_IDX) {
 >          /* access to code in flash */
 >          paddr = OFFSET_CODE + address;
 > --
 > 2.25.1
 >
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com

 / Manchester, UK

-- 
Best Regards,

Michael Rolnik

-- 
Best Regards,

Michael Rolnik

--
Best Regards,
Michael Rolnik

Re: [PATCH v1 5/8] target/riscv: Implementation of enhanced PMP (ePMP)

2021-04-11 Thread Alistair Francis

On Fri, Apr 9, 2021 at 2:24 PM Bin Meng  wrote:
>
> On Fri, Apr 2, 2021 at 8:50 PM Alistair Francis
>  wrote:
> >
> > From: Hou Weiying 
> >
> > This commit adds support for ePMP v0.9.1.
> >
> > The ePMP spec can be found in:
> > https://docs.google.com/document/d/1Mh_aiHYxemL0umN3GTTw8vsbmzHZ_nxZXgjgOUzbvc8
> >
> > Signed-off-by: Hongzheng-Li 
> > Signed-off-by: Hou Weiying 
> > Signed-off-by: Myriad-Dreamin 
> > Message-Id: 
> > 
> > [ Changes by AF:
> >  - Rebase on master
> >  - Update to latest spec
> >  - Use a switch case to handle ePMP MML permissions
> >  - Fix a few bugs
> > ]
> > Signed-off-by: Alistair Francis 
> > ---
> >  target/riscv/pmp.c | 165 +
> >  1 file changed, 153 insertions(+), 12 deletions(-)
> >
> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> > index 1d071b044b..3794c808e8 100644
> > --- a/target/riscv/pmp.c
> > +++ b/target/riscv/pmp.c
> > @@ -90,11 +90,42 @@ static inline uint8_t pmp_read_cfg(CPURISCVState *env, 
> > uint32_t pmp_index)
> >  static void pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t 
> > val)
> >  {
> >  if (pmp_index < MAX_RISCV_PMPS) {
> > -if (!pmp_is_locked(env, pmp_index)) {
> > -env->pmp_state.pmp[pmp_index].cfg_reg = val;
> > -pmp_update_rule(env, pmp_index);
> > +bool locked = true;
> > +
> > +if (riscv_feature(env, RISCV_FEATURE_EPMP)) {
> > +/* mseccfg.RLB is set */
> > +if (MSECCFG_RLB_ISSET(env)) {
> > +locked = false;
> > +}
> > +
> > +/* mseccfg.MML is not set */
> > +if (!MSECCFG_MML_ISSET(env) && !pmp_is_locked(env, pmp_index)) 
> > {
> > +locked = false;
> > +}
> > +
> > +/* mseccfg.MML is set */
> > +if (MSECCFG_MML_ISSET(env)) {
> > +/* not adding execute bit */
> > +if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) != PMP_EXEC) 
> > {
> > +locked = false;
> > +}
> > + /* shared region and not adding X bit*/
>
> nits: /* is not aligned, and a space is needed before */
>
> > +if ((val & PMP_LOCK) != PMP_LOCK &&
> > +(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {
> > +locked = false;
> > +}
> > +}
> >  } else {
> > +if (!pmp_is_locked(env, pmp_index)) {
> > +locked = false;
> > +}
> > +}
> > +
> > +if (locked) {
> >  qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - 
> > locked\n");
> > +} else {
> > +env->pmp_state.pmp[pmp_index].cfg_reg = val;
> > +pmp_update_rule(env, pmp_index);
> >  }
> >  } else {
> >  qemu_log_mask(LOG_GUEST_ERROR,
> > @@ -217,6 +248,33 @@ static bool pmp_hart_has_privs_default(CPURISCVState 
> > *env, target_ulong addr,
> >  {
> >  bool ret;
> >
> > +if (riscv_feature(env, RISCV_FEATURE_EPMP)) {
> > +if (MSECCFG_MMWP_ISSET(env)) {
> > +/*
> > + * The Machine Mode Whitelist Policy (mseccfg.MMWP) is set
> > + * so we default to deny all, even for M mode.
>
> nits: M-mode
>
> > + */
> > +*allowed_privs = 0;
> > +return false;
> > +} else if (MSECCFG_MML_ISSET(env)) {
> > +/*
> > + * The Machine Mode Lockdown (mseccfg.MML) bit is set
> > + * so we can only execute code in M mode with an applicable
>
> nits: M-mode
>
> > + * rule.
> > + * Other modes are disabled.
>
> nits: this line can be put in the same line of "rule."
>
> > + */
> > +if (mode == PRV_M && !(privs & PMP_EXEC)) {
> > +ret = true;
> > +*allowed_privs = PMP_READ | PMP_WRITE;
> > +} else {
> > +ret = false;
> > +*allowed_privs = 0;
> > +}
> > +
> > +return ret;
> > +}
>
> If I understand the spec correctly, I think we are missing a branch to
> handle MML unset case, in which RWX is allowed in M-mode.

Yep, so if MML and MMWP aren't set then we just fall back to the
standard PMP checks which are below. So M-mode accesses will be
allowed and other privs won't be.

>
> > +}
> > +
> >  if ((!riscv_feature(env, RISCV_FEATURE_PMP)) || (mode == PRV_M)) {
> >  /*
> >   * Privileged spec v1.10 states if HW doesn't implement any PMP 
> > entry
> > @@ -294,13 +352,94 @@ bool pmp_hart_has_privs(CPURISCVState *env, 
> > target_ulong addr,
> >  pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
> >
> >  /*
> > - * If the PMP entry is not off and the address is in range, do the 
> > priv
> > - * check
> > + * Convert the PMP permissions to match the truth table in the
> > + * ePMP spec.
> >

Re: [PATCH v2 03/10] Python: add utility function for retrieving port redirection

2021-04-11 Thread Cleber Rosa

On Thu, Mar 25, 2021 at 02:10:19PM -0400, John Snow wrote:
> On 3/23/21 6:15 PM, Cleber Rosa wrote:
> > Slightly different versions for the same utility code are currently
> > present on different locations.  This unifies them all, giving
> > preference to the version from virtiofs_submounts.py, because of the
> > last tweaks added to it.
> > 
> > While at it, this adds a "qemu.utils" module to host the utility
> > function and a test.
> > 
> > Signed-off-by: Cleber Rosa 
> > Reviewed-by: Wainer dos Santos Moschetta 
> > ---
> >   python/qemu/utils.py | 35 
> >   tests/acceptance/info_usernet.py | 29 
> >   tests/acceptance/linux_ssh_mips_malta.py | 16 +--
> >   tests/acceptance/virtiofs_submounts.py   | 21 --
> >   tests/vm/basevm.py   |  7 ++---
> >   5 files changed, 78 insertions(+), 30 deletions(-)
> >   create mode 100644 python/qemu/utils.py
> >   create mode 100644 tests/acceptance/info_usernet.py
> > 
> > diff --git a/python/qemu/utils.py b/python/qemu/utils.py
> > new file mode 100644
> > index 00..89a246ab30
> > --- /dev/null
> > +++ b/python/qemu/utils.py
> > @@ -0,0 +1,35 @@
> > +"""
> > +QEMU utility library
> > +
> > +This offers miscellaneous utility functions, which may not be easily
> > +distinguishable or numerous to be in their own module.
> > +"""
> > +
> > +# Copyright (C) 2021 Red Hat Inc.
> > +#
> > +# Authors:
> > +#  Cleber Rosa 
> > +#
> > +# This work is licensed under the terms of the GNU GPL, version 2.  See
> > +# the COPYING file in the top-level directory.
> > +#
> > +
> > +import re
> > +from typing import Optional
> > +
> > +
> > +def get_info_usernet_hostfwd_port(info_usernet_output: str) -> 
> > Optional[int]:
> > +"""
> > +Returns the port given to the hostfwd parameter via info usernet
> > +
> > +:param info_usernet_output: output generated by hmp command "info 
> > usernet"
> > +:param info_usernet_output: str
> > +:return: the port number allocated by the hostfwd option
> > +:rtype: int
> 
> I think, unless you know something I don't, that I would prefer to keep type
> information in the "live" annotations where they can be checked against rot.
> 

No, that's a good point.  No need to have type information defined twice.

> > +"""
> > +for line in info_usernet_output.split('\r\n'):
> > +regex = r'TCP.HOST_FORWARD.*127\.0\.0\.1\s+(\d+)\s+10\.'
> > +match = re.search(regex, line)
> > +if match is not None:
> > +return int(match[1])
> > +return None
> 
> I wonder if more guest-specific code doesn't belong elsewhere, but I don't
> have a strong counter-suggestion, so I would probably ACK this for now.
>

There are multiple users of this pattern, and they go beyond the
acceptance tests, so I think unifying them is a bit more important
then having a better location.  Also, like you, I can't think, of a
better place at this time.

> (Are you okay with the idea that we won't include the utils module in the
> PyPI upload? I think I would like to avoid shipping something like this
> outside of our castle walls, but agree that having it in the common code
> area somewhere for our own use is good.)
>

At this time I don't have a need for it in the PyPI upload, but I
wonder if this exception is justified.  I mean, what would be gained,
besides dealing with the exception itself, by not including it?

Thanks for the feedback!
- Cleber


signature.asc
Description: PGP signature

Re: [PATCH 2/2] target/arm: Initlaize PMU feature for scratch vcpu

2021-04-11 Thread Gavin Shan


Hi Peter,

On 4/7/21 5:38 PM, Peter Maydell wrote:

On Wed, 7 Apr 2021 at 03:01, Gavin Shan  wrote:


If the scratch vCPU is initialized without PMU feature, we receive
error on reading PMCR_EL0 as it's invisible in this case. It leads
to host probing failure.

This fixes the issue by initializing the scratch vcpu with the PMU
feature enabled and reading PMCR_EL0 from host. Otherwise, its value
is set according to the detected target.

Fixes: f7fb73b8cdd3 ("target/arm: Make number of counters in PMCR follow the 
CPU")


This commit has been reverted...

I couldn't find a cover letter for these patches, so it's
hard to tell what you're aiming to do with them. Could you
make sure you always send a cover letter with a multiple-patch
series, please ? This also helps with our automated tooling.



Sorry for the delay. Yep, I will always include cover letter for
a series. For this particular series, it's invalid since f7fb73b8cdd3
has been reverted. So please ignore this series.

Thanks,
Gavin

Re: [PATCH v2 04/10] Acceptance Tests: move useful ssh methods to base class

2021-04-11 Thread Cleber Rosa

On Wed, Mar 24, 2021 at 10:07:31AM +0100, Auger Eric wrote:
> Hi Cleber,
> 
> On 3/23/21 11:15 PM, Cleber Rosa wrote:
> > Both the virtiofs submounts and the linux ssh mips malta tests
> > contains useful methods related to ssh that deserve to be made
> > available to other tests.  Let's move them to the base LinuxTest
> nit: strictly speaking they are moved to another class which is
> inherited by LinuxTest, right?

I forgot to address this comment previously.  Yes, you're right.
I'll reword it.

Thanks!
- Cleber.

> > class.
> > 
> > The method that helps with setting up an ssh connection will now
> > support both key and password based authentication, defaulting to key
> > based.
> > 
> > Signed-off-by: Cleber Rosa 
> > Reviewed-by: Wainer dos Santos Moschetta 
> > Reviewed-by: Willian Rampazzo 
> > ---
> >  tests/acceptance/avocado_qemu/__init__.py | 48 ++-
> >  tests/acceptance/linux_ssh_mips_malta.py  | 38 ++
> >  tests/acceptance/virtiofs_submounts.py| 37 -
> >  3 files changed, 50 insertions(+), 73 deletions(-)
> > 
> > diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> > b/tests/acceptance/avocado_qemu/__init__.py
> > index 83b1741ec8..67f75f66e5 100644
> > --- a/tests/acceptance/avocado_qemu/__init__.py
> > +++ b/tests/acceptance/avocado_qemu/__init__.py
> > @@ -20,6 +20,7 @@
> >  from avocado.utils import cloudinit
> >  from avocado.utils import datadrainer
> >  from avocado.utils import network
> > +from avocado.utils import ssh
> >  from avocado.utils import vmimage
> >  from avocado.utils.path import find_command
> >  
> > @@ -43,6 +44,8 @@
> >  from qemu.accel import kvm_available
> >  from qemu.accel import tcg_available
> >  from qemu.machine import QEMUMachine
> > +from qemu.utils import get_info_usernet_hostfwd_port
> > +
> >  
> >  def is_readable_executable_file(path):
> >  return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
> > @@ -253,7 +256,50 @@ def fetch_asset(self, name,
> >  cancel_on_missing=cancel_on_missing)
> >  
> >  
> > -class LinuxTest(Test):
> > +class LinuxSSHMixIn:
> > +"""Contains utility methods for interacting with a guest via SSH."""
> > +
> > +def ssh_connect(self, username, credential, credential_is_key=True):
> > +self.ssh_logger = logging.getLogger('ssh')
> > +res = self.vm.command('human-monitor-command',
> > +  command_line='info usernet')
> > +port = get_info_usernet_hostfwd_port(res)
> > +self.assertIsNotNone(port)
> > +self.assertGreater(port, 0)
> > +self.log.debug('sshd listening on port: %d', port)
> > +if credential_is_key:
> > +self.ssh_session = ssh.Session('127.0.0.1', port=port,
> > +   user=username, key=credential)
> > +else:
> > +self.ssh_session = ssh.Session('127.0.0.1', port=port,
> > +   user=username, 
> > password=credential)
> > +for i in range(10):
> > +try:
> > +self.ssh_session.connect()
> > +return
> > +except:
> > +time.sleep(4)
> > +pass
> > +self.fail('ssh connection timeout')
> > +
> > +def ssh_command(self, command):
> > +self.ssh_logger.info(command)
> > +result = self.ssh_session.cmd(command)
> > +stdout_lines = [line.rstrip() for line
> > +in result.stdout_text.splitlines()]
> > +for line in stdout_lines:
> > +self.ssh_logger.info(line)
> > +stderr_lines = [line.rstrip() for line
> > +in result.stderr_text.splitlines()]
> > +for line in stderr_lines:
> > +self.ssh_logger.warning(line)
> > +
> > +self.assertEqual(result.exit_status, 0,
> > + f'Guest command failed: {command}')
> > +return stdout_lines, stderr_lines
> > +
> > +
> > +class LinuxTest(Test, LinuxSSHMixIn):
> >  """Facilitates having a cloud-image Linux based available.
> >  
> >  For tests that indend to interact with guests, this is a better choice
> > diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
> > b/tests/acceptance/linux_ssh_mips_malta.py
> > index 052008f02d..3f590a081f 100644
> > --- a/tests/acceptance/linux_ssh_mips_malta.py
> > +++ b/tests/acceptance/linux_ssh_mips_malta.py
> > @@ -12,7 +12,7 @@
> >  import time
> >  
> >  from avocado import skipUnless
> > -from avocado_qemu import Test
> > +from avocado_qemu import Test, LinuxSSHMixIn
> >  from avocado_qemu import wait_for_console_pattern
> >  from avocado.utils import process
> >  from avocado.utils import archive
> > @@ -21,7 +21,7 @@
> >  from qemu.utils import get_info_usernet_hostfwd_port
> Can't you remove this now?
> >  
> >  
> > -class LinuxSSH(Test):
> > +class LinuxSSH(Test, LinuxSSHMixIn):
> out of curiosit

Re: [PATCH v2 05/10] Acceptance Tests: add port redirection for ssh by default

2021-04-11 Thread Cleber Rosa

On Thu, Mar 25, 2021 at 02:57:48PM -0300, Wainer dos Santos Moschetta wrote:
> Hi,
> 
> On 3/24/21 6:10 AM, Auger Eric wrote:
> > Hi Cleber,
> > 
> > On 3/23/21 11:15 PM, Cleber Rosa wrote:
> > > For users of the LinuxTest class, let's set up the VM with the port
> > > redirection for SSH, instead of requiring each test to set the same
> > also sets the network device to virtio-net. This may be worth mentioning
> > here in the commit msg.
> > > arguments.
> > > 
> > > Signed-off-by: Cleber Rosa 
> > Reviewed-by: Eric Auger 
> > 
> > Thanks
> > 
> > Eric
> > 
> > > ---
> > >   tests/acceptance/avocado_qemu/__init__.py | 4 +++-
> > >   tests/acceptance/virtiofs_submounts.py| 4 
> > >   2 files changed, 3 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> > > b/tests/acceptance/avocado_qemu/__init__.py
> > > index 67f75f66e5..e75b002c70 100644
> > > --- a/tests/acceptance/avocado_qemu/__init__.py
> > > +++ b/tests/acceptance/avocado_qemu/__init__.py
> > > @@ -309,10 +309,12 @@ class LinuxTest(Test, LinuxSSHMixIn):
> > >   timeout = 900
> > >   chksum = None
> > > -def setUp(self, ssh_pubkey=None):
> > > +def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
> > >   super(LinuxTest, self).setUp()
> > >   self.vm.add_args('-smp', '2')
> > >   self.vm.add_args('-m', '1024')
> > > +self.vm.add_args('-netdev', 
> > > 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
> > > + '-device', '%s,netdev=vnet' % 
> > > network_device_type)
> > >   self.set_up_boot()
> > >   if ssh_pubkey is None:
> > >   ssh_pubkey, self.ssh_key = self.set_up_existing_ssh_keys()
> > > diff --git a/tests/acceptance/virtiofs_submounts.py 
> > > b/tests/acceptance/virtiofs_submounts.py
> > > index bed8ce44df..e10a935ac4 100644
> > > --- a/tests/acceptance/virtiofs_submounts.py
> > > +++ b/tests/acceptance/virtiofs_submounts.py
> > > @@ -207,10 +207,6 @@ def setUp(self):
> > >   self.vm.add_args('-kernel', vmlinuz,
> > >'-append', 'console=ttyS0 root=/dev/sda1')
> > > -# Allow us to connect to SSH
> 
> Somewhat related with Eric's suggestion: keep the above comment along with
> the netdev setup code.
> 
> - Wainer
>

Sure, good point.

Thanks,
- Cleber.


signature.asc
Description: PGP signature

Re: [PATCH v2 09/10] Acceptance Tests: add basic documentation on LinuxTest base class

2021-04-11 Thread Cleber Rosa

On Thu, Mar 25, 2021 at 03:14:58PM -0300, Wainer dos Santos Moschetta wrote:
> Hi,
> 
> On 3/23/21 7:15 PM, Cleber Rosa wrote:
> > Signed-off-by: Cleber Rosa 
> > Reviewed-by: Marc-André Lureau 
> > Reviewed-by: Willian Rampazzo 
> > ---
> >   docs/devel/testing.rst | 25 +
> >   1 file changed, 25 insertions(+)
> > 
> > diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
> > index 1da4c4e4c4..ed2a06db28 100644
> > --- a/docs/devel/testing.rst
> > +++ b/docs/devel/testing.rst
> > @@ -810,6 +810,31 @@ and hypothetical example follows:
> >   At test "tear down", ``avocado_qemu.Test`` handles all the QEMUMachines
> >   shutdown.
> > +The ``avocado_qemu.LinuxTest`` base test class
> > +~~
> > +
> > +The ``avocado_qemu.LinuxTest`` is further specialization of the
> > +``avocado_qemu.Test`` class, so it contains all the characteristics of
> > +the later plus some extra features.
> > +
> > +First of all, this base class is intended for tests that need to
> > +interact with a fully booted and operational Linux guest.  The most
> > +basic example looks like this:
> 
> I think it is worth mentioning currently it will boot a Fedora 31 cloud-init
> image.
>

Sure, makes sense.

Thanks!
- Cleber.


signature.asc
Description: PGP signature

Re: [PATCH v2 00/10] Acceptance Test: introduce base class for Linux based tests

2021-04-11 Thread Cleber Rosa

On Thu, Mar 25, 2021 at 04:45:51PM -0300, Wainer dos Santos Moschetta wrote:
> Hi,
> 
> On 3/23/21 7:15 PM, Cleber Rosa wrote:
> > This introduces a base class for tests that need to interact with a
> > Linux guest.  It generalizes the "boot_linux.py" code, already been
> > used by the "virtiofs_submounts.py" and also SSH related code being
> > used by that and "linux_ssh_mips_malta.py".
> 
> I ran the linux_ssh_mips_malta.py tests, they all passed:
> 
> (11/34) 
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32eb_kernel3_2_0:
> PASS (64.41 s)
> (12/34) 
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32el_kernel3_2_0:
> PASS (63.43 s)
> (13/34) 
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64eb_kernel3_2_0:
> PASS (63.76 s)
> (14/34) 
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64el_kernel3_2_0:
> PASS (62.52 s)
> 
> Then I tried the virtiofs_submounts.py tests, it finishes with error.
> Something like that fixes it:
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py
> b/tests/acceptance/virtiofs_submounts.py
> index d77ee35674..21ad7d792e 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -195,7 +195,7 @@ def setUp(self):
> 
>  self.run(('ssh-keygen', '-N', '', '-t', 'ed25519', '-f',
> self.ssh_key))
> 
> -    pubkey = open(self.ssh_key + '.pub').read()
> +    pubkey = self.ssh_key + '.pub'
> 
>  super(VirtiofsSubmountsTest, self).setUp(pubkey)
> 

Hi Wainer,

Yes, thank you so much for catching that and proposing a fix.  I'm
adding to the v3 of this series.

Thanks again!
- Cleber.


signature.asc
Description: PGP signature

[PATCH v3 01/11] tests/acceptance/virtiofs_submounts.py: add missing accel tag

2021-04-11 Thread Cleber Rosa

The tag is useful to select tests that depend/use a particular
feature.

Signed-off-by: Cleber Rosa 
Reviewed-by: Wainer dos Santos Moschetta 
Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 
---
 tests/acceptance/virtiofs_submounts.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index 46fa65392a1..5b74ce2929b 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -70,6 +70,7 @@ def test_something_that_needs_cmd1_and_cmd2(self):
 class VirtiofsSubmountsTest(LinuxTest):
 """
 :avocado: tags=arch:x86_64
+:avocado: tags=accel:kvm
 """
 
 def get_portfwd(self):
-- 
2.30.2

[PATCH v3 00/11] Acceptance Test: introduce base class for Linux based tests

2021-04-11 Thread Cleber Rosa

This introduces a base class for tests that need to interact with a
Linux guest.  It generalizes the "boot_linux.py" code, already been
used by the "virtiofs_submounts.py" and also SSH related code being
used by that and "linux_ssh_mips_malta.py".

While at it, a number of fixes on hopeful improvements to those tests
were added.

Changes from v2:

* Removed type information in docstring on python/qemu/utils.py, as
  that's already present on the type hints (John Snow)

* Reworded commit message about moving ssh-related methods to a auxiliary,
  mix-in class, and not to the base LinuxTest class (Eric Auger)

* Removed unused import of get_info_usernet_hostfwd_port on
  tests/acceptance/linux_ssh_mips_malta.py (Eric Auger)

* Added note on commit message about setUp() method also allowing one
  to define network device, which is by default, set to virtio-net
  (Eric Auger)

* Kept note about the network device that allows for SSH connections
  (Wainer Moschetta)

* Do not set up an SSH connection on tests that won't be using it
  (Eric Auger)

* Mention the use of a Fedora 31 guest image (Wainer Moschetta)

* Fix of SSH pubkey setup on tests/acceptance/virtiofs_submounts.py
  (new patch, reported by Wainer Moschetta)

Changes from v1:

* Majority of v1 patches have been merged.

* New patches:
  - Acceptance Tests: make username/password configurable
  - Acceptance Tests: set up SSH connection by default after boot for LinuxTest
  - tests/acceptance/virtiofs_submounts.py: remove launch_vm()

* Allowed for the configuration of the network device type (defaulting
  to virtio-net) [Phil]

* Fix module name typo (s/qemu.util/qemu.utils/) in the commit message
  [John]

* Tests based on LinuxTest will have the SSH connection already prepared

Cleber Rosa (11):
  tests/acceptance/virtiofs_submounts.py: add missing accel tag
  tests/acceptance/virtiofs_submounts.py: evaluate string not length
  Python: add utility function for retrieving port redirection
  Acceptance Tests: move useful ssh methods to base class
  Acceptance Tests: add port redirection for ssh by default
  Acceptance Tests: make username/password configurable
  Acceptance Tests: set up SSH connection by default after boot for
LinuxTest
  tests/acceptance/virtiofs_submounts.py: remove launch_vm()
  Acceptance Tests: add basic documentation on LinuxTest base class
  Acceptance Tests: introduce CPU hotplug test
  tests/acceptance/virtiofs_submounts.py: fix setup of SSH pubkey

 docs/devel/testing.rst| 26 +
 python/qemu/utils.py  | 33 +++
 tests/acceptance/avocado_qemu/__init__.py | 64 ++--
 tests/acceptance/boot_linux.py| 18 +++---
 tests/acceptance/hotplug_cpu.py   | 37 
 tests/acceptance/info_usernet.py  | 29 +
 tests/acceptance/linux_ssh_mips_malta.py  | 42 +-
 tests/acceptance/virtiofs_submounts.py| 71 +++
 tests/vm/basevm.py|  7 +--
 9 files changed, 206 insertions(+), 121 deletions(-)
 create mode 100644 python/qemu/utils.py
 create mode 100644 tests/acceptance/hotplug_cpu.py
 create mode 100644 tests/acceptance/info_usernet.py

-- 
2.30.2

[PATCH v3 03/11] Python: add utility function for retrieving port redirection

2021-04-11 Thread Cleber Rosa

Slightly different versions for the same utility code are currently
present on different locations.  This unifies them all, giving
preference to the version from virtiofs_submounts.py, because of the
last tweaks added to it.

While at it, this adds a "qemu.utils" module to host the utility
function and a test.

Signed-off-by: Cleber Rosa 
Reviewed-by: Wainer dos Santos Moschetta 
Reviewed-by: Eric Auger 
Reviewed-by: Willian Rampazzo 
---
 python/qemu/utils.py | 33 
 tests/acceptance/info_usernet.py | 29 +
 tests/acceptance/linux_ssh_mips_malta.py | 16 +---
 tests/acceptance/virtiofs_submounts.py   | 21 ---
 tests/vm/basevm.py   |  7 ++---
 5 files changed, 76 insertions(+), 30 deletions(-)
 create mode 100644 python/qemu/utils.py
 create mode 100644 tests/acceptance/info_usernet.py

diff --git a/python/qemu/utils.py b/python/qemu/utils.py
new file mode 100644
index 000..5ed789275ee
--- /dev/null
+++ b/python/qemu/utils.py
@@ -0,0 +1,33 @@
+"""
+QEMU utility library
+
+This offers miscellaneous utility functions, which may not be easily
+distinguishable or numerous to be in their own module.
+"""
+
+# Copyright (C) 2021 Red Hat Inc.
+#
+# Authors:
+#  Cleber Rosa 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+#
+
+import re
+from typing import Optional
+
+
+def get_info_usernet_hostfwd_port(info_usernet_output: str) -> Optional[int]:
+"""
+Returns the port given to the hostfwd parameter via info usernet
+
+:param info_usernet_output: output generated by hmp command "info usernet"
+:return: the port number allocated by the hostfwd option
+"""
+for line in info_usernet_output.split('\r\n'):
+regex = r'TCP.HOST_FORWARD.*127\.0\.0\.1\s+(\d+)\s+10\.'
+match = re.search(regex, line)
+if match is not None:
+return int(match[1])
+return None
diff --git a/tests/acceptance/info_usernet.py b/tests/acceptance/info_usernet.py
new file mode 100644
index 000..9c1fd903a0b
--- /dev/null
+++ b/tests/acceptance/info_usernet.py
@@ -0,0 +1,29 @@
+# Test for the hmp command "info usernet"
+#
+# Copyright (c) 2021 Red Hat, Inc.
+#
+# Author:
+#  Cleber Rosa 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+from avocado_qemu import Test
+
+from qemu.utils import get_info_usernet_hostfwd_port
+
+
+class InfoUsernet(Test):
+
+def test_hostfwd(self):
+self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22')
+self.vm.launch()
+res = self.vm.command('human-monitor-command',
+  command_line='info usernet')
+port = get_info_usernet_hostfwd_port(res)
+self.assertIsNotNone(port,
+ ('"info usernet" output content does not seem to '
+  'contain the redirected port'))
+self.assertGreater(port, 0,
+   ('Found a redirected port that is not greater than'
+' zero'))
diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
b/tests/acceptance/linux_ssh_mips_malta.py
index 6dbd02d49d5..052008f02d4 100644
--- a/tests/acceptance/linux_ssh_mips_malta.py
+++ b/tests/acceptance/linux_ssh_mips_malta.py
@@ -18,6 +18,8 @@
 from avocado.utils import archive
 from avocado.utils import ssh
 
+from qemu.utils import get_info_usernet_hostfwd_port
+
 
 class LinuxSSH(Test):
 
@@ -70,18 +72,14 @@ def get_kernel_info(self, endianess, wordsize):
 def setUp(self):
 super(LinuxSSH, self).setUp()
 
-def get_portfwd(self):
+def ssh_connect(self, username, password):
+self.ssh_logger = logging.getLogger('ssh')
 res = self.vm.command('human-monitor-command',
   command_line='info usernet')
-line = res.split('\r\n')[2]
-port = re.split(r'.*TCP.HOST_FORWARD.*127\.0\.0\.1 (\d+)\s+10\..*',
-line)[1]
+port = get_info_usernet_hostfwd_port(res)
+if not port:
+self.cancel("Failed to retrieve SSH port")
 self.log.debug("sshd listening on port:" + port)
-return port
-
-def ssh_connect(self, username, password):
-self.ssh_logger = logging.getLogger('ssh')
-port = self.get_portfwd()
 self.ssh_session = ssh.Session(self.VM_IP, port=int(port),
user=username, password=password)
 for i in range(10):
diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index ca64b76301f..57a7047342f 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -9,6 +9,8 @@
 from avocado_qemu import wait_for_console_pattern
 from avocado.utils import ssh
 
+from

[PATCH v3 05/11] Acceptance Tests: add port redirection for ssh by default

2021-04-11 Thread Cleber Rosa

For users of the LinuxTest class, let's set up the VM with the port
redirection for SSH, instead of requiring each test to set the same
arguments.

It also sets the network device, by default, to virtio-net.

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Eric Auger 
Reviewed-by: Willian Rampazzo 
---
 tests/acceptance/avocado_qemu/__init__.py | 5 -
 tests/acceptance/virtiofs_submounts.py| 4 
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 67f75f66e56..085688f 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -309,10 +309,13 @@ class LinuxTest(Test, LinuxSSHMixIn):
 timeout = 900
 chksum = None
 
-def setUp(self, ssh_pubkey=None):
+def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
 super(LinuxTest, self).setUp()
 self.vm.add_args('-smp', '2')
 self.vm.add_args('-m', '1024')
+# The following network device allows for SSH connections
+self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
+ '-device', '%s,netdev=vnet' % network_device_type)
 self.set_up_boot()
 if ssh_pubkey is None:
 ssh_pubkey, self.ssh_key = self.set_up_existing_ssh_keys()
diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index bed8ce44dfc..e10a935ac4e 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -207,10 +207,6 @@ def setUp(self):
 self.vm.add_args('-kernel', vmlinuz,
  '-append', 'console=ttyS0 root=/dev/sda1')
 
-# Allow us to connect to SSH
-self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
- '-device', 'virtio-net,netdev=vnet')
-
 self.require_accelerator("kvm")
 self.vm.add_args('-accel', 'kvm')
 
-- 
2.30.2

[PATCH v3 02/11] tests/acceptance/virtiofs_submounts.py: evaluate string not length

2021-04-11 Thread Cleber Rosa

If the vmlinuz variable is set to anything that evaluates to True,
then the respective arguments should be set.  If the variable contains
an empty string, than it will evaluate to False, and the extra
arguments will not be set.

This keeps the same logic, but improves readability a bit.

Signed-off-by: Cleber Rosa 
Reviewed-by: Beraldo Leal 
Reviewed-by: Eric Auger 
Reviewed-by: Willian Rampazzo 
---
 tests/acceptance/virtiofs_submounts.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index 5b74ce2929b..ca64b76301f 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -251,7 +251,7 @@ def setUp(self):
 
 super(VirtiofsSubmountsTest, self).setUp(pubkey)
 
-if len(vmlinuz) > 0:
+if vmlinuz:
 self.vm.add_args('-kernel', vmlinuz,
  '-append', 'console=ttyS0 root=/dev/sda1')
 
-- 
2.30.2

[PATCH v3 08/11] tests/acceptance/virtiofs_submounts.py: remove launch_vm()

2021-04-11 Thread Cleber Rosa

The LinuxTest class' launch_and_wait() method now behaves the same way
as this test's custom launch_vm(), so let's just use the upper layer
(common) method.

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Eric Auger 
Reviewed-by: Willian Rampazzo 
---
 tests/acceptance/virtiofs_submounts.py | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index e019d3b896b..d77ee356740 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -134,9 +134,6 @@ def set_up_virtiofs(self):
  '-numa',
  'node,memdev=mem')
 
-def launch_vm(self):
-self.launch_and_wait()
-
 def set_up_nested_mounts(self):
 scratch_dir = os.path.join(self.shared_dir, 'scratch')
 try:
@@ -225,7 +222,7 @@ def test_pre_virtiofsd_set_up(self):
 self.set_up_nested_mounts()
 
 self.set_up_virtiofs()
-self.launch_vm()
+self.launch_and_wait()
 self.mount_in_guest()
 self.check_in_guest()
 
@@ -235,14 +232,14 @@ def test_pre_launch_set_up(self):
 
 self.set_up_nested_mounts()
 
-self.launch_vm()
+self.launch_and_wait()
 self.mount_in_guest()
 self.check_in_guest()
 
 def test_post_launch_set_up(self):
 self.set_up_shared_dir()
 self.set_up_virtiofs()
-self.launch_vm()
+self.launch_and_wait()
 
 self.set_up_nested_mounts()
 
@@ -252,7 +249,7 @@ def test_post_launch_set_up(self):
 def test_post_mount_set_up(self):
 self.set_up_shared_dir()
 self.set_up_virtiofs()
-self.launch_vm()
+self.launch_and_wait()
 self.mount_in_guest()
 
 self.set_up_nested_mounts()
@@ -265,7 +262,7 @@ def test_two_runs(self):
 self.set_up_nested_mounts()
 
 self.set_up_virtiofs()
-self.launch_vm()
+self.launch_and_wait()
 self.mount_in_guest()
 self.check_in_guest()
 
-- 
2.30.2

[PATCH v3 06/11] Acceptance Tests: make username/password configurable

2021-04-11 Thread Cleber Rosa

This makes the username/password used for authentication configurable,
because some guest operating systems may have restrictions on accounts
to be used for logins, and it just makes it better documented.

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Eric Auger 
Reviewed-by: Willian Rampazzo 
---
 tests/acceptance/avocado_qemu/__init__.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 085688f..25f871f5bc6 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -308,6 +308,8 @@ class LinuxTest(Test, LinuxSSHMixIn):
 
 timeout = 900
 chksum = None
+username = 'root'
+password = 'password'
 
 def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
 super(LinuxTest, self).setUp()
@@ -371,8 +373,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
 with open(ssh_pubkey) as pubkey:
 pubkey_content = pubkey.read()
 cloudinit.iso(cloudinit_iso, self.name,
-  username='root',
-  password='password',
+  username=self.username,
+  password=self.password,
   # QEMU's hard coded usermode router address
   phone_home_host='10.0.2.2',
   phone_home_port=self.phone_home_port,
-- 
2.30.2

[PATCH v3 10/11] Acceptance Tests: introduce CPU hotplug test

2021-04-11 Thread Cleber Rosa

Even though there are qtest based tests for hotplugging CPUs (from
which this test took some inspiration from), this one adds checks
from a Linux guest point of view.

It should also serve as an example for tests that follow a similar
pattern and need to interact with QEMU (via qmp) and with the Linux
guest via SSH.

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 
---
 tests/acceptance/hotplug_cpu.py | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 tests/acceptance/hotplug_cpu.py

diff --git a/tests/acceptance/hotplug_cpu.py b/tests/acceptance/hotplug_cpu.py
new file mode 100644
index 000..6374bf1b546
--- /dev/null
+++ b/tests/acceptance/hotplug_cpu.py
@@ -0,0 +1,37 @@
+# Functional test that hotplugs a CPU and checks it on a Linux guest
+#
+# Copyright (c) 2021 Red Hat, Inc.
+#
+# Author:
+#  Cleber Rosa 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+from avocado_qemu import LinuxTest
+
+
+class HotPlugCPU(LinuxTest):
+
+def test(self):
+"""
+:avocado: tags=arch:x86_64
+:avocado: tags=machine:q35
+:avocado: tags=accel:kvm
+"""
+self.require_accelerator('kvm')
+self.vm.add_args('-accel', 'kvm')
+self.vm.add_args('-cpu', 'Haswell')
+self.vm.add_args('-smp', '1,sockets=1,cores=2,threads=1,maxcpus=2')
+self.launch_and_wait()
+
+self.ssh_command('test -e /sys/devices/system/cpu/cpu0')
+with self.assertRaises(AssertionError):
+self.ssh_command('test -e /sys/devices/system/cpu/cpu1')
+
+self.vm.command('device_add',
+driver='Haswell-x86_64-cpu',
+socket_id=0,
+core_id=1,
+thread_id=0)
+self.ssh_command('test -e /sys/devices/system/cpu/cpu1')
-- 
2.30.2

[PATCH v3 04/11] Acceptance Tests: move useful ssh methods to base class

2021-04-11 Thread Cleber Rosa

Both the virtiofs submounts and the linux ssh mips malta tests
contains useful methods related to ssh that deserve to be made
available to other tests.  Let's move them to an auxiliary, mix-in
class that will be used on the base LinuxTest class.

The method that helps with setting up an ssh connection will now
support both key and password based authentication, defaulting to key
based.

Signed-off-by: Cleber Rosa 
Reviewed-by: Wainer dos Santos Moschetta 
Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 
Signed-off-by: Cleber Rosa 
---
 tests/acceptance/avocado_qemu/__init__.py | 48 ++-
 tests/acceptance/linux_ssh_mips_malta.py  | 40 ++-
 tests/acceptance/virtiofs_submounts.py| 37 -
 3 files changed, 50 insertions(+), 75 deletions(-)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 83b1741ec85..67f75f66e56 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -20,6 +20,7 @@
 from avocado.utils import cloudinit
 from avocado.utils import datadrainer
 from avocado.utils import network
+from avocado.utils import ssh
 from avocado.utils import vmimage
 from avocado.utils.path import find_command
 
@@ -43,6 +44,8 @@
 from qemu.accel import kvm_available
 from qemu.accel import tcg_available
 from qemu.machine import QEMUMachine
+from qemu.utils import get_info_usernet_hostfwd_port
+
 
 def is_readable_executable_file(path):
 return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
@@ -253,7 +256,50 @@ def fetch_asset(self, name,
 cancel_on_missing=cancel_on_missing)
 
 
-class LinuxTest(Test):
+class LinuxSSHMixIn:
+"""Contains utility methods for interacting with a guest via SSH."""
+
+def ssh_connect(self, username, credential, credential_is_key=True):
+self.ssh_logger = logging.getLogger('ssh')
+res = self.vm.command('human-monitor-command',
+  command_line='info usernet')
+port = get_info_usernet_hostfwd_port(res)
+self.assertIsNotNone(port)
+self.assertGreater(port, 0)
+self.log.debug('sshd listening on port: %d', port)
+if credential_is_key:
+self.ssh_session = ssh.Session('127.0.0.1', port=port,
+   user=username, key=credential)
+else:
+self.ssh_session = ssh.Session('127.0.0.1', port=port,
+   user=username, password=credential)
+for i in range(10):
+try:
+self.ssh_session.connect()
+return
+except:
+time.sleep(4)
+pass
+self.fail('ssh connection timeout')
+
+def ssh_command(self, command):
+self.ssh_logger.info(command)
+result = self.ssh_session.cmd(command)
+stdout_lines = [line.rstrip() for line
+in result.stdout_text.splitlines()]
+for line in stdout_lines:
+self.ssh_logger.info(line)
+stderr_lines = [line.rstrip() for line
+in result.stderr_text.splitlines()]
+for line in stderr_lines:
+self.ssh_logger.warning(line)
+
+self.assertEqual(result.exit_status, 0,
+ f'Guest command failed: {command}')
+return stdout_lines, stderr_lines
+
+
+class LinuxTest(Test, LinuxSSHMixIn):
 """Facilitates having a cloud-image Linux based available.
 
 For tests that indend to interact with guests, this is a better choice
diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
b/tests/acceptance/linux_ssh_mips_malta.py
index 052008f02d4..61c9079d047 100644
--- a/tests/acceptance/linux_ssh_mips_malta.py
+++ b/tests/acceptance/linux_ssh_mips_malta.py
@@ -12,16 +12,14 @@
 import time
 
 from avocado import skipUnless
-from avocado_qemu import Test
+from avocado_qemu import Test, LinuxSSHMixIn
 from avocado_qemu import wait_for_console_pattern
 from avocado.utils import process
 from avocado.utils import archive
 from avocado.utils import ssh
 
-from qemu.utils import get_info_usernet_hostfwd_port
 
-
-class LinuxSSH(Test):
+class LinuxSSH(Test, LinuxSSHMixIn):
 
 timeout = 150 # Not for 'configure --enable-debug --enable-debug-tcg'
 
@@ -72,41 +70,9 @@ def get_kernel_info(self, endianess, wordsize):
 def setUp(self):
 super(LinuxSSH, self).setUp()
 
-def ssh_connect(self, username, password):
-self.ssh_logger = logging.getLogger('ssh')
-res = self.vm.command('human-monitor-command',
-  command_line='info usernet')
-port = get_info_usernet_hostfwd_port(res)
-if not port:
-self.cancel("Failed to retrieve SSH port")
-self.log.debug("sshd listening on port:" + port)
-self.ssh_session = ssh.Session(self.VM_IP, port=int(port),
-

[PATCH v3 11/11] tests/acceptance/virtiofs_submounts.py: fix setup of SSH pubkey

2021-04-11 Thread Cleber Rosa

The public key argument should be a path to a file, and not the
public key data.

Reported-by: Wainer dos Santos Moschetta 
Signed-off-by: Cleber Rosa 
---
 tests/acceptance/virtiofs_submounts.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index d77ee356740..21ad7d792e7 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -195,7 +195,7 @@ def setUp(self):
 
 self.run(('ssh-keygen', '-N', '', '-t', 'ed25519', '-f', self.ssh_key))
 
-pubkey = open(self.ssh_key + '.pub').read()
+pubkey = self.ssh_key + '.pub'
 
 super(VirtiofsSubmountsTest, self).setUp(pubkey)
 
-- 
2.30.2

[PATCH v3 07/11] Acceptance Tests: set up SSH connection by default after boot for LinuxTest

2021-04-11 Thread Cleber Rosa

The LinuxTest specifically targets users that need to interact with Linux
guests.  So, it makes sense to give a connection by default, and avoid
requiring it as boiler-plate code.

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Willian Rampazzo 
---
 tests/acceptance/avocado_qemu/__init__.py |  5 -
 tests/acceptance/boot_linux.py| 18 +-
 tests/acceptance/virtiofs_submounts.py|  1 -
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 25f871f5bc6..1062a851b97 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -391,7 +391,7 @@ def set_up_cloudinit(self, ssh_pubkey=None):
 cloudinit_iso = self.prepare_cloudinit(ssh_pubkey)
 self.vm.add_args('-drive', 'file=%s,format=raw' % cloudinit_iso)
 
-def launch_and_wait(self):
+def launch_and_wait(self, set_up_ssh_connection=True):
 self.vm.set_console()
 self.vm.launch()
 console_drainer = 
datadrainer.LineLogger(self.vm.console_socket.fileno(),
@@ -399,3 +399,6 @@ def launch_and_wait(self):
 console_drainer.start()
 self.log.info('VM launched, waiting for boot confirmation from guest')
 cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
self.name)
+if set_up_ssh_connection:
+self.log.info('Setting up the SSH connection')
+self.ssh_connect(self.username, self.ssh_key)
diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 0d178038a09..314370fd1f5 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -29,7 +29,7 @@ def test_pc_i440fx_tcg(self):
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 def test_pc_i440fx_kvm(self):
 """
@@ -38,7 +38,7 @@ def test_pc_i440fx_kvm(self):
 """
 self.require_accelerator("kvm")
 self.vm.add_args("-accel", "kvm")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 def test_pc_q35_tcg(self):
 """
@@ -47,7 +47,7 @@ def test_pc_q35_tcg(self):
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 def test_pc_q35_kvm(self):
 """
@@ -56,7 +56,7 @@ def test_pc_q35_kvm(self):
 """
 self.require_accelerator("kvm")
 self.vm.add_args("-accel", "kvm")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 
 class BootLinuxAarch64(LinuxTest):
@@ -85,7 +85,7 @@ def test_virt_tcg(self):
 self.vm.add_args("-cpu", "max")
 self.vm.add_args("-machine", "virt,gic-version=2")
 self.add_common_args()
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 def test_virt_kvm_gicv2(self):
 """
@@ -98,7 +98,7 @@ def test_virt_kvm_gicv2(self):
 self.vm.add_args("-cpu", "host")
 self.vm.add_args("-machine", "virt,gic-version=2")
 self.add_common_args()
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 def test_virt_kvm_gicv3(self):
 """
@@ -111,7 +111,7 @@ def test_virt_kvm_gicv3(self):
 self.vm.add_args("-cpu", "host")
 self.vm.add_args("-machine", "virt,gic-version=3")
 self.add_common_args()
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 
 class BootLinuxPPC64(LinuxTest):
@@ -128,7 +128,7 @@ def test_pseries_tcg(self):
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
 
 
 class BootLinuxS390X(LinuxTest):
@@ -146,4 +146,4 @@ def test_s390_ccw_virtio_tcg(self):
 """
 self.require_accelerator("tcg")
 self.vm.add_args("-accel", "tcg")
-self.launch_and_wait()
+self.launch_and_wait(set_up_ssh_connection=False)
diff --git a/tests/acceptance/virtiofs_submounts.py 
b/tests/acceptance/virtiofs_submounts.py
index e10a935ac4e..e019d3b896b 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -136,7 +136,6 @@ def set_up_virtiofs(self):
 
 def launch_vm(self):
 self.launch_and_wait()
-self.ssh_connect('root', self.ssh_key)
 
 def set_up_nested_mounts(self):
 scratch_dir = os.path.join(self.shared_dir, 'scratch')
-- 
2.30.2

Re: Better alternative to strncpy in QEMU.

2021-04-11 Thread Thomas Huth


On 11/04/2021 15.50, Chetan wrote:

Hello All,

This mail is in reference to one of the tasks mentioned in 
'/Contribute/BiteSizedTasks/' in QEMU wiki, under '/API conversion/' which 
states to introduce a better alternative to strncpy function.


Looks like this task has been added by Paolo, so I'm adding him to Cc: now.

( 
https://wiki.qemu.org/index.php?title=Contribute/BiteSizedTasks&diff=9130&oldid=9045 
)


I've drafted 
and tested below implementation for the same. Before proceeding with any 
changes in QEMU code can you all please go through it and suggest 
changes/corrections if required.


//* This function is introduced in place of strncpy(), it asserts if destination
  * is large enough to fit strlen(source)+1 bytes and guarantees null 
termination

  * in destination string.
  *
  * char source[], is expecting a pointer to the source where data should be 
copied

  * from.
  *
  * char destination[], is expecting a pointer to the destination where data 
should

  * be copied to.
  *
  * size_t destination_size, is expecting size of destination.
  * In case of char[], sizeof() function can be used to find the size.
  * In case of char *, provide value which was passed to malloc() function for
  * memory allocation.
  */
char *qemu_strncpy(char destination[], char source[], size_t destination_size)


Please use "*destination" and "*source" instead of "destination[]" and 
"source[]" here.



{
     /* Looping through the array and copying the characters from
      * source to destination.
      */
     for (int i = 0; i < strlen(source); i++) {
         destination[i] = source[i];

         /* Check if value of i is equal to the second last index
          * of destination array and if condition is true, mark last
          * index as NULL and break from the loop.
          */
         if (i == (destination_size - 2)) {
             destination[destination_size - 1] = '\0';
             break;
         }
     }
     return destination;
}


I think this is pretty much the same as g_strlcpy() from the glib:

https://developer.gnome.org/glib/2.66/glib-String-Utility-Functions.html#g-strlcpy

So I guess Paolo had something different in mind when adding this task?


/* This function is introduced in place of strncpy(), it asserts if destination
  * is large enough to fit strlen(source) bytes and does not guarantee null
  * termination in destination string.
  *
  * char source[], is expecting a pointer to the source where data should be 
copied

  * from.
  *
  * char destination[], is expecting a pointer to the destination where data 
should

  * be copied to.
  *
  * size_t destination_size, is expecting size of destination.
  * In case of char[], sizeof() function can be used to find the size.
  * In case of char *, provide value which was passed to malloc() function for
  * memory allocation.
  */
char *qemu_strncpy_nonul(char destination[], char source[], size_t 
destination_size)

{
     /* Looping through the array and copying the characters from
      * source to destination.
      */
     for (int i = 0; i < strlen(source); i++) {
         destination[i] = source[i];

         /* Check if value of i is equal to the last index
          * of the destination array and if condition is true,
          * break from the loop.
          */
         if (i == (destination_size - 1)) {
             break;
         }
     }
     return destination;
} /


I'm not sure what's the improvement over strncpy() here? Paolo, could you 
elaborate?
(Note that we also have some functions like strpadcpy() in QEMU already, 
which can be used in similar ways)


 Thomas

[PATCH v3 09/11] Acceptance Tests: add basic documentation on LinuxTest base class

2021-04-11 Thread Cleber Rosa

Signed-off-by: Cleber Rosa 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 
Reviewed-by: Wainer dos Santos Moschetta 
---
 docs/devel/testing.rst | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 1da4c4e4c4e..4e423928106 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -810,6 +810,32 @@ and hypothetical example follows:
 At test "tear down", ``avocado_qemu.Test`` handles all the QEMUMachines
 shutdown.
 
+The ``avocado_qemu.LinuxTest`` base test class
+~~
+
+The ``avocado_qemu.LinuxTest`` is further specialization of the
+``avocado_qemu.Test`` class, so it contains all the characteristics of
+the later plus some extra features.
+
+First of all, this base class is intended for tests that need to
+interact with a fully booted and operational Linux guest.  At this
+time, it uses a Fedora 31 guest image.  The most basic example looks
+like this:
+
+.. code::
+
+  from avocado_qemu import LinuxTest
+
+
+  class SomeTest(LinuxTest):
+
+  def test(self):
+  self.launch_and_wait()
+  self.ssh_command('some_command_to_be_run_in_the_guest')
+
+Please refer to tests that use ``avocado_qemu.LinuxTest`` under
+``tests/acceptance`` for more examples.
+
 QEMUMachine
 ~~~
 
-- 
2.30.2

Re: [PATCH 1/4] target/ppc: Code motion required to build disabling tcg

2021-04-11 Thread David Gibson

On Fri, Apr 09, 2021 at 04:48:41PM -0300, Fabiano Rosas wrote:
> "Bruno Larsen (billionai)"  writes:
> 
> A general advice for this whole series is: make sure you add in some
> words explaining why you decided to make a particular change. It will be
> much easier to review if we know what were the logical steps leading to
> the change.
> 
> > This commit does the necessary code motion from translate_init.c.inc
> 
> For instance, I don't immediately see why these changes are necessary. I
> see that translate_init.c.inc already has some `#ifdef CONFIG_TCG`, so
> why do we need to move a bunch of code into cpu.c instead of just adding
> more code under ifdef CONFIG_TCG? (I'm not saying it's wrong, just trying to
> understand the reasoning).
> 
> Is translate_init.c.inc intended to be TCG only? But then I see you
> moved TCG-only functions out of it (ppc_fixup_cpu) and left not TCG-only
> functions (gen_spr_generic).
> 
> > This moves all functions that start with gdb_* into target/ppc/gdbstub.c
> > and creates a new function that calls those and is called by ppc_cpu_realize
> 
> This looks like it makes sense regardless of disable-tcg, could we have
> it in a standalone patch?
> 
> > All functions related to realizing the cpu have been moved to cpu.c, which
> > may call functions from gdbstub or translate_init
> 
> Again, I don't disagree with this, but at first sight it doesn't seem
> entirely related to disabling TCG.

Fabioano's points seconded.  This isn't necessarily a bad idea, but a
rationale would really help.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [RFC PATCH 0/4] target/ppc: add disable-tcg option

2021-04-11 Thread David Gibson

On Fri, Apr 09, 2021 at 08:57:48AM -0700, no-re...@patchew.org wrote:
> Patchew URL: 
> https://patchew.org/QEMU/20210409151916.97326-1-bruno.lar...@eldorado.org.br/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 20210409151916.97326-1-bruno.lar...@eldorado.org.br
> Subject: [RFC PATCH 0/4] target/ppc: add disable-tcg option

You will need to fix these style errors.

Note that there's quite a bit of existing code in target-ppc which
doesn't have current-correct qemu style.  Despite this, please make
any changes in a checkpatch happy style - the hope is that we'll
gradually convert the legacy pieces to updated style.

> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  - [tag update]  
> patchew/161786467973.295167.5612704777283969903.st...@bahia.lan -> 
> patchew/161786467973.295167.5612704777283969903.st...@bahia.lan
>  - [tag update]  patchew/20210409150527.15053-1-peter.mayd...@linaro.org 
> -> patchew/20210409150527.15053-1-peter.mayd...@linaro.org
>  * [new tag] 
> patchew/20210409151916.97326-1-bruno.lar...@eldorado.org.br -> 
> patchew/20210409151916.97326-1-bruno.lar...@eldorado.org.br
> Switched to a new branch 'test'
> 0250bc9 target/ppc: updated build rules for disable-tcg option
> e36c2a7 target/ppc: Add stubs for tcg functions, so it builds
> 4e6d44d target/ppc: added solutions for building with disable-tcg
> 38ccad3 target/ppc: Code motion required to build disabling tcg
> 
> === OUTPUT BEGIN ===
> 1/4 Checking commit 38ccad308a44 (target/ppc: Code motion required to build 
> disabling tcg)
> 2/4 Checking commit 4e6d44d2a68a (target/ppc: added solutions for building 
> with disable-tcg)
> WARNING: Block comments use a leading /* on a separate line
> #43: FILE: target/ppc/arch_dump.c:182:
> +/* This is the first solution implemented. My personal favorite as it
> 
> WARNING: Block comments use a trailing */ on a separate line
> #44: FILE: target/ppc/arch_dump.c:183:
> + * allows for explicit error handling, however it is much less readable 
> */
> 
> ERROR: space required before the open brace '{'
> #46: FILE: target/ppc/arch_dump.c:185:
> +if(kvm_enabled()){
> 
> ERROR: space required before the open parenthesis '('
> #46: FILE: target/ppc/arch_dump.c:185:
> +if(kvm_enabled()){
> 
> ERROR: space required after that close brace '}'
> #48: FILE: target/ppc/arch_dump.c:187:
> +}else
> 
> WARNING: line over 80 characters
> #55: FILE: target/ppc/arch_dump.c:194:
> +/* TODO: add proper error handling, even tough this should never be 
> reached */
> 
> ERROR: space required before the open brace '{'
> #79: FILE: target/ppc/kvm.c:2953:
> +int kvmppc_mtvscr(PowerPCCPU *cpu, uint32_t val){
> 
> ERROR: space required before the open brace '{'
> #87: FILE: target/ppc/kvm.c:2961:
> +if(ret < 0){
> 
> ERROR: space required before the open parenthesis '('
> #87: FILE: target/ppc/kvm.c:2961:
> +if(ret < 0){
> 
> ERROR: space required before the open brace '{'
> #93: FILE: target/ppc/kvm.c:2967:
> +int kvmppc_mfvscr(PowerPCCPU *cpu){
> 
> ERROR: space required before the open brace '{'
> #101: FILE: target/ppc/kvm.c:2975:
> +if(ret < 0){
> 
> ERROR: space required before the open parenthesis '('
> #101: FILE: target/ppc/kvm.c:2975:
> +if(ret < 0){
> 
> ERROR: "(foo*)" should be "(foo *)"
> #115: FILE: target/ppc/kvm_ppc.h:90:
> +int kvmppc_mfvscr(PowerPCCPU*);
> 
> WARNING: Block comments use a leading /* on a separate line
> #117: FILE: target/ppc/kvm_ppc.h:92:
> +/* This is the second (quick and dirty) solution. Not my personal favorite
> 
> WARNING: Block comments use a trailing */ on a separate line
> #119: FILE: target/ppc/kvm_ppc.h:94:
> + * for error checking. but it requires less change in other files */
> 
> ERROR: space required after that ',' (ctx:VxV)
> #121: FILE: target/ppc/kvm_ppc.h:96:
> +#define helper_mtvscr(env, val) kvmppc_mtvscr(env_archcpu(env),val)
>^
> 
> ERROR: space required before the open brace '{'
> #148: FILE: target/ppc/machine.c:101:
> +if(kvm_enabled()){
> 
> ERROR: space required before the open parenthesis '('
> #148: FILE: target/ppc/machine.c:101:
> +if(kvm_enabled()){
> 
> ERROR: space required after that close brace '}'
> #150: FILE: target/ppc/machine.c:103:
> +}else
> 
> WARNING: line over 80 characters
> #156: FILE: target/ppc/machine.c:109:
> +/* TODO: Add correct error handling, even though this should never 
> be reached */
> 
> ERROR: space required before the open brace '{'
> #167: FILE: target/

Re: [PATCH 2/4] target/ppc: added solutions for building with disable-tcg

2021-04-11 Thread David Gibson

On Fri, Apr 09, 2021 at 12:19:14PM -0300, Bruno Larsen (billionai) wrote:
> this commit presents 2 possible solutions for substituting TCG emulation
> with KVM calls. One - used in machine.c and arch_dump.c - explicitly
> adds the KVM function and has the possibility of adding the TCG one
> for more generic compilation, prioritizing te KVM option. The second
> option, implemented in kvm_ppc.h, transparently changes the helper
> into the KVM call, if TCG is not enabled. I believe the first solution
> is better, but it is less readable, so I wanted to have some feedback
> 
> Signed-off-by: Bruno Larsen (billionai) 
> ---
>  target/ppc/arch_dump.c | 17 +
>  target/ppc/kvm.c   | 30 ++
>  target/ppc/kvm_ppc.h   | 11 +++
>  target/ppc/machine.c   | 33 -
>  4 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
> index 9ab04b2c38..c53e01011a 100644
> --- a/target/ppc/arch_dump.c
> +++ b/target/ppc/arch_dump.c
> @@ -17,7 +17,10 @@
>  #include "elf.h"
>  #include "sysemu/dump.h"
>  #include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
> +#if defined(CONFIG_TCG)
>  #include "exec/helper-proto.h"
> +#endif /* CONFIG_TCG */
>  
>  #ifdef TARGET_PPC64
>  #define ELFCLASS ELFCLASS64
> @@ -176,7 +179,21 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, 
> PowerPCCPU *cpu)
>  vmxregset->avr[i].u64[1] = avr->u64[1];
>  }
>  }
> +/* This is the first solution implemented. My personal favorite as it
> + * allows for explicit error handling, however it is much less readable 
> */
> +#if defined(CONFIG_KVM)
> +if(kvm_enabled()){
> +vmxregset->vscr.u32[3] = cpu_to_dump32(s, kvmppc_mfvscr(cpu));
> +}else
> +#endif
> +
> +#if defined(CONFIG_TCG)
>  vmxregset->vscr.u32[3] = cpu_to_dump32(s, helper_mfvscr(&cpu->env));
> +#else
> +{
> +/* TODO: add proper error handling, even tough this should never be 
> reached */
> +}
> +#endif

I think this is more complex than you need.  AFAICT, the logic in
helper_mfcsvr() is still valid even without TCG (we still have a copy
of the state in 'env' with KVM).

You could move helper_mfvscr() to a file that isn't going to get
excluded for !TCG builds.

Not directly related to what you're trying to accomplish here, but the
whole vscr_sat thing looks really weird.  I have no idea why we're
splitting out the storage of VSCR[SAT] for the TCG case at all.  If
you wanted to clean that up as a preliminary, I'd be grateful.

>  }
>  
>  static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 104a308abb..8ed54d12d8 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -51,6 +51,7 @@
>  #include "elf.h"
>  #include "sysemu/kvm_int.h"
>  
> +
>  #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
>  
>  #define DEBUG_RETURN_GUEST 0
> @@ -2947,3 +2948,32 @@ bool kvm_arch_cpu_check_are_resettable(void)
>  {
>  return true;
>  }
> +
> +/* Functions added to replace helper_m(t|f)vscr from int_helper.c */
> +int kvmppc_mtvscr(PowerPCCPU *cpu, uint32_t val){
> +CPUState *cs = CPU(cpu);
> +CPUPPCState *env = &cpu->env;
> +struct kvm_one_reg reg;
> +int ret;
> +reg.id = KVM_REG_PPC_VSCR;
> +reg.addr = (uintptr_t) &env->vscr;
> +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
> +if(ret < 0){
> +fprintf(stderr, "Unable to set VSCR to KVM: %s", strerror(errno));
> +}
> +return ret;
> +}
> +
> +int kvmppc_mfvscr(PowerPCCPU *cpu){
> +CPUState *cs = CPU(cpu);
> +CPUPPCState *env = &cpu->env;
> +struct kvm_one_reg reg;
> +int ret;
> +reg.id = KVM_REG_PPC_VSCR;
> +reg.addr = (uintptr_t) &env->vscr;
> +ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
> +if(ret < 0){
> +fprintf(stderr, "Unable to get VSCR to KVM: %s", strerror(errno));
> +}
> +return ret;
> +}
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index 989f61ace0..f618cb28b1 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -86,6 +86,17 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t 
> tb_offset);
>  
>  int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run);
>  
> +int kvmppc_mtvscr(PowerPCCPU*, uint32_t);
> +int kvmppc_mfvscr(PowerPCCPU*);
> +
> +/* This is the second (quick and dirty) solution. Not my personal favorite
> + * as it hides what is actually happening from the user and doesn't allow
> + * for error checking. but it requires less change in other files */
> +#ifndef CONFIG_TCG
> +#define helper_mtvscr(env, val) kvmppc_mtvscr(env_archcpu(env),val)
> +#define helper_mfvscr(env) kvmppc_mfvscr(env_archcpu(env))
> +#endif
> +
>  #else
>  
>  static inline uint32_t kvmppc_get_tbfreq(void)
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index 283db1d28a..d92bc18859 100644
> --- a/target/ppc/machine.c
> +++ b/t

Re: [RFC PATCH 0/4] target/ppc: add disable-tcg option

2021-04-11 Thread David Gibson

For future reference, please CC me explicitly on things you'd like me
to review.  I do scan the qemu-...@nongnu.org list, but it makes it
easier for me to find (and less likely that I'll accidentally overlook
it) if I'm also CCed directly.

On Fri, Apr 09, 2021 at 12:19:12PM -0300, Bruno Larsen (billionai) wrote:
> This patch series aims to add the option to build without TCG for the
> powerpc target. This RFC shows mostly the strategies employed when
> dealing with compilation problems, and ask for input on the bits
> we don't quite understand yet.
> 
> The first patch mostly code motion, as referenced in 2021-04/msg0717.
> The second patch shows the 2 strategies we've considered, and hope to
> get feedback on. The third patch contains the stubs we haven't decided
> on how to deal with yet, but needed to exist to compile the project.
> The final patch just changes the meson.build rules
> 
> Bruno Larsen (billionai) (4):
>   target/ppc: Code motion required to build disabling tcg
>   target/ppc: added solutions for problems encountered when building
> with disable-tcg
>   target/ppc: Add stubs for tcg functions, so it build with disable-tcg
>   target/ppc: updated build rules for disable-tcg option
> 
>  target/ppc/arch_dump.c  |   17 +
>  target/ppc/cpu.c|  859 +++
>  target/ppc/cpu.h|   15 +
>  target/ppc/gdbstub.c|  253 +++
>  target/ppc/kvm.c|   30 +
>  target/ppc/kvm_ppc.h|   11 +
>  target/ppc/machine.c|   33 +-
>  target/ppc/meson.build  |   22 +-
>  target/ppc/tcg-stub.c   |  139 
>  target/ppc/translate_init.c.inc | 1148 +--
>  10 files changed, 1407 insertions(+), 1120 deletions(-)
>  create mode 100644 target/ppc/tcg-stub.c
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH RFC v5 03/12] target/riscv: Implement function kvm_arch_init_vcpu

2021-04-11 Thread Yifei Jiang

Get isa info from kvm while kvm init.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 687dd4b621..0d924be33f 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -38,6 +38,18 @@
 #include "qemu/log.h"
 #include "hw/loader.h"
 
+static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 type, __u64 idx)
+{
+__u64 id = KVM_REG_RISCV | type | idx;
+
+if (riscv_cpu_is_32bit(env)) {
+id |= KVM_REG_SIZE_U32;
+} else {
+id |= KVM_REG_SIZE_U64;
+}
+return id;
+}
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
@@ -79,7 +91,20 @@ void kvm_arch_init_irq_routing(KVMState *s)
 
 int kvm_arch_init_vcpu(CPUState *cs)
 {
-return 0;
+int ret = 0;
+target_ulong isa;
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
+__u64 id;
+
+id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG, 
KVM_REG_RISCV_CONFIG_REG(isa));
+ret = kvm_get_one_reg(cs, id, &isa);
+if (ret) {
+return ret;
+}
+env->misa = isa | RVXLEN;
+
+return ret;
 }
 
 int kvm_arch_msi_data_to_gsi(uint32_t data)
-- 
2.19.1

[PATCH RFC v5 02/12] target/riscv: Add target/riscv/kvm.c to place the public kvm interface

2021-04-11 Thread Yifei Jiang

Add target/riscv/kvm.c to place kvm_arch_* function needed by
kvm/kvm-all.c. Meanwhile, add kvm support in meson.build file.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
Reviewed-by: Alistair Francis 
---
 meson.build  |   2 +
 target/riscv/kvm.c   | 133 +++
 target/riscv/meson.build |   1 +
 3 files changed, 136 insertions(+)
 create mode 100644 target/riscv/kvm.c

diff --git a/meson.build b/meson.build
index c6f4b0cf5e..1eab53f03e 100644
--- a/meson.build
+++ b/meson.build
@@ -72,6 +72,8 @@ elif cpu in ['ppc', 'ppc64']
   kvm_targets = ['ppc-softmmu', 'ppc64-softmmu']
 elif cpu in ['mips', 'mips64']
   kvm_targets = ['mips-softmmu', 'mipsel-softmmu', 'mips64-softmmu', 
'mips64el-softmmu']
+elif cpu in ['riscv32', 'riscv64']
+  kvm_targets = ['riscv32-softmmu', 'riscv64-softmmu']
 else
   kvm_targets = []
 endif
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
new file mode 100644
index 00..687dd4b621
--- /dev/null
+++ b/target/riscv/kvm.c
@@ -0,0 +1,133 @@
+/*
+ * RISC-V implementation of KVM hooks
+ *
+ * Copyright (c) 2020 Huawei Technologies Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include 
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
+#include "cpu.h"
+#include "trace.h"
+#include "hw/pci/pci.h"
+#include "exec/memattrs.h"
+#include "exec/address-spaces.h"
+#include "hw/boards.h"
+#include "hw/irq.h"
+#include "qemu/log.h"
+#include "hw/loader.h"
+
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+KVM_CAP_LAST_INFO
+};
+
+int kvm_arch_get_registers(CPUState *cs)
+{
+return 0;
+}
+
+int kvm_arch_put_registers(CPUState *cs, int level)
+{
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
+int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+ uint64_t address, uint32_t data, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+return 0;
+}
+
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu->cpu_index;
+}
+
+void kvm_arch_init_irq_routing(KVMState *s)
+{
+}
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+return 0;
+}
+
+int kvm_arch_msi_data_to_gsi(uint32_t data)
+{
+abort();
+}
+
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_init(MachineState *ms, KVMState *s)
+{
+return 0;
+}
+
+int kvm_arch_irqchip_create(KVMState *s)
+{
+return 0;
+}
+
+int kvm_arch_process_async_events(CPUState *cs)
+{
+return 0;
+}
+
+void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
+{
+}
+
+MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
+{
+return MEMTXATTRS_UNSPECIFIED;
+}
+
+bool kvm_arch_stop_on_emulation_error(CPUState *cs)
+{
+return true;
+}
+
+int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
+{
+return 0;
+}
+
+bool kvm_arch_cpu_check_are_resettable(void)
+{
+return true;
+}
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index 88ab850682..32afd6e882 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -23,6 +23,7 @@ riscv_ss.add(files(
   'vector_helper.c',
   'translate.c',
 ))
+riscv_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c'))
 
 riscv_softmmu_ss = ss.source_set()
 riscv_softmmu_ss.add(files(
-- 
2.19.1

[PATCH RFC v5 01/12] linux-header: Update linux/kvm.h

2021-04-11 Thread Yifei Jiang

Update linux-headers/linux/kvm.h from
https://github.com/avpatel/linux/tree/riscv_kvm_v17.
Only use this header file, so here do not update all linux headers by
update-linux-headers.sh until above KVM series is accepted.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 linux-headers/linux/kvm.h | 97 +++
 1 file changed, 97 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 020b62a619..1e92fd2a76 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -216,6 +216,20 @@ struct kvm_hyperv_exit {
} u;
 };
 
+struct kvm_xen_exit {
+#define KVM_EXIT_XEN_HCALL  1
+   __u32 type;
+   union {
+   struct {
+   __u32 longmode;
+   __u32 cpl;
+   __u64 input;
+   __u64 result;
+   __u64 params[6];
+   } hcall;
+   } u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX1048576
 
@@ -251,6 +265,10 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_X86_RDMSR29
 #define KVM_EXIT_X86_WRMSR30
 #define KVM_EXIT_DIRTY_RING_FULL  31
+#define KVM_EXIT_AP_RESET_HOLD32
+#define KVM_EXIT_X86_BUS_LOCK 33
+#define KVM_EXIT_XEN  34
+#define KVM_EXIT_RISCV_SBI35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -427,6 +445,15 @@ struct kvm_run {
__u32 index; /* kernel -> user */
__u64 data; /* kernel <-> user */
} msr;
+   /* KVM_EXIT_XEN */
+   struct kvm_xen_exit xen;
+   /* KVM_EXIT_RISCV_SBI */
+   struct {
+   unsigned long extension_id;
+   unsigned long function_id;
+   unsigned long args[6];
+   unsigned long ret[2];
+   } riscv_sbi;
/* Fix the size of the union. */
char padding[256];
};
@@ -573,6 +600,7 @@ struct kvm_vapic_addr {
 #define KVM_MP_STATE_CHECK_STOP6
 #define KVM_MP_STATE_OPERATING 7
 #define KVM_MP_STATE_LOAD  8
+#define KVM_MP_STATE_AP_RESET_HOLD 9
 
 struct kvm_mp_state {
__u32 mp_state;
@@ -1056,6 +1084,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
 #define KVM_CAP_SYS_HYPERV_CPUID 191
 #define KVM_CAP_DIRTY_LOG_RING 192
+#define KVM_CAP_X86_BUS_LOCK_EXIT 193
+#define KVM_CAP_PPC_DAWR1 194
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1129,6 +1159,11 @@ struct kvm_x86_mce {
 #endif
 
 #ifdef KVM_CAP_XEN_HVM
+#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR   (1 << 0)
+#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
+#define KVM_XEN_HVM_CONFIG_RUNSTATE(1 << 3)
+
 struct kvm_xen_hvm_config {
__u32 flags;
__u32 msr;
@@ -1563,6 +1598,57 @@ struct kvm_pv_cmd {
 /* Available with KVM_CAP_DIRTY_LOG_RING */
 #define KVM_RESET_DIRTY_RINGS  _IO(KVMIO, 0xc7)
 
+/* Per-VM Xen attributes */
+#define KVM_XEN_HVM_GET_ATTR   _IOWR(KVMIO, 0xc8, struct kvm_xen_hvm_attr)
+#define KVM_XEN_HVM_SET_ATTR   _IOW(KVMIO,  0xc9, struct kvm_xen_hvm_attr)
+
+struct kvm_xen_hvm_attr {
+   __u16 type;
+   __u16 pad[3];
+   union {
+   __u8 long_mode;
+   __u8 vector;
+   struct {
+   __u64 gfn;
+   } shared_info;
+   __u64 pad[8];
+   } u;
+};
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_ATTR_TYPE_LONG_MODE0x0
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO  0x1
+#define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR0x2
+
+/* Per-vCPU Xen attributes */
+#define KVM_XEN_VCPU_GET_ATTR  _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
+#define KVM_XEN_VCPU_SET_ATTR  _IOW(KVMIO,  0xcb, struct kvm_xen_vcpu_attr)
+
+struct kvm_xen_vcpu_attr {
+   __u16 type;
+   __u16 pad[3];
+   union {
+   __u64 gpa;
+   __u64 pad[8];
+   struct {
+   __u64 state;
+   __u64 state_entry_time;
+   __u64 time_running;
+   __u64 time_runnable;
+   __u64 time_blocked;
+   __u64 time_offline;
+   } runstate;
+   } u;
+};
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO   0x0
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO  0x1
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR   0x2
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT0x3
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA   0x4
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST 0x5
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
/* Guest initializatio

[PATCH RFC v5 09/12] target/riscv: Add host cpu type

2021-04-11 Thread Yifei Jiang

'host' type cpu is set isa to RVXLEN simply, more isa info
will obtain from KVM in kvm_arch_init_vcpu()

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/cpu.c | 9 +
 target/riscv/cpu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index dd34ab4978..8132d35a92 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -216,6 +216,12 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
 }
 #endif
 
+static void riscv_host_cpu_init(Object *obj)
+{
+CPURISCVState *env = &RISCV_CPU(obj)->env;
+set_misa(env, RVXLEN);
+}
+
 static ObjectClass *riscv_cpu_class_by_name(const char *cpu_model)
 {
 ObjectClass *oc;
@@ -706,6 +712,9 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 .class_init = riscv_cpu_class_init,
 },
 DEFINE_CPU(TYPE_RISCV_CPU_ANY,  riscv_any_cpu_init),
+#if defined(CONFIG_KVM)
+DEFINE_CPU(TYPE_RISCV_CPU_HOST, riscv_host_cpu_init),
+#endif
 #if defined(TARGET_RISCV32)
 DEFINE_CPU(TYPE_RISCV_CPU_BASE32,   rv32_base_cpu_init),
 DEFINE_CPU(TYPE_RISCV_CPU_IBEX, rv32_ibex_cpu_init),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index a489d94187..3ca3dad341 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -43,6 +43,7 @@
 #define TYPE_RISCV_CPU_SIFIVE_E51   RISCV_CPU_TYPE_NAME("sifive-e51")
 #define TYPE_RISCV_CPU_SIFIVE_U34   RISCV_CPU_TYPE_NAME("sifive-u34")
 #define TYPE_RISCV_CPU_SIFIVE_U54   RISCV_CPU_TYPE_NAME("sifive-u54")
+#define TYPE_RISCV_CPU_HOST RISCV_CPU_TYPE_NAME("host")
 
 #if defined(TARGET_RISCV32)
 # define TYPE_RISCV_CPU_BASETYPE_RISCV_CPU_BASE32
-- 
2.19.1

[PATCH RFC v5 06/12] target/riscv: Support start kernel directly by KVM

2021-04-11 Thread Yifei Jiang

Get kernel and fdt start address in virt.c, and pass them to KVM
when cpu reset. In addition, add kvm_riscv.h to place riscv specific
interface.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 hw/riscv/boot.c  | 11 +++
 hw/riscv/virt.c  |  7 +++
 include/hw/riscv/boot.h  |  1 +
 target/riscv/cpu.c   |  8 
 target/riscv/cpu.h   |  3 +++
 target/riscv/kvm-stub.c  | 25 +
 target/riscv/kvm.c   | 13 +
 target/riscv/kvm_riscv.h | 24 
 target/riscv/meson.build |  2 +-
 9 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/kvm-stub.c
 create mode 100644 target/riscv/kvm_riscv.h

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 0d38bb7426..b9741a647d 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -290,3 +290,14 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
RISCVHartArrayState *harts
 
 return;
 }
+
+void riscv_setup_direct_kernel(hwaddr kernel_addr, hwaddr fdt_addr)
+{
+CPUState *cs;
+
+for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
+RISCVCPU *riscv_cpu = RISCV_CPU(cs);
+riscv_cpu->env.kernel_addr = kernel_addr;
+riscv_cpu->env.fdt_addr = fdt_addr;
+}
+}
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index c0dc69ff33..4a1fca139c 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -728,6 +728,13 @@ static void virt_machine_init(MachineState *machine)
   virt_memmap[VIRT_MROM].size, kernel_entry,
   fdt_load_addr, machine->fdt);
 
+/*
+ * Only direct boot kernel is currently supported for KVM VM,
+ * So here setup kernel start address and fdt address.
+ * TODO:Support firmware loading and integrate to TCG start
+ */
+riscv_setup_direct_kernel(kernel_entry, fdt_load_addr);
+
 /* SiFive Test MMIO device */
 sifive_test_create(memmap[VIRT_TEST].base);
 
diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
index 11a21dd584..28d838cc29 100644
--- a/include/hw/riscv/boot.h
+++ b/include/hw/riscv/boot.h
@@ -51,5 +51,6 @@ void riscv_rom_copy_firmware_info(MachineState *machine, 
hwaddr rom_base,
   hwaddr rom_size,
   uint32_t reset_vec_size,
   uint64_t kernel_entry);
+void riscv_setup_direct_kernel(hwaddr kernel_addr, hwaddr fdt_addr);
 
 #endif /* RISCV_BOOT_H */
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 7d6ed80f6b..dd34ab4978 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -29,6 +29,8 @@
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 #include "fpu/softfloat-helpers.h"
+#include "sysemu/kvm.h"
+#include "kvm_riscv.h"
 
 /* RISC-V CPU definitions */
 
@@ -361,6 +363,12 @@ static void riscv_cpu_reset(DeviceState *dev)
 cs->exception_index = EXCP_NONE;
 env->load_res = -1;
 set_default_nan_mode(1, &env->fp_status);
+
+#ifndef CONFIG_USER_ONLY
+if (kvm_enabled()) {
+kvm_riscv_reset_vcpu(cpu);
+}
+#endif
 }
 
 static void riscv_cpu_disas_set_info(CPUState *s, disassemble_info *info)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0a33d387ba..a489d94187 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -243,6 +243,9 @@ struct CPURISCVState {
 
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *timer; /* Internal timer */
+
+hwaddr kernel_addr;
+hwaddr fdt_addr;
 };
 
 OBJECT_DECLARE_TYPE(RISCVCPU, RISCVCPUClass,
diff --git a/target/riscv/kvm-stub.c b/target/riscv/kvm-stub.c
new file mode 100644
index 00..39b96fe3f4
--- /dev/null
+++ b/target/riscv/kvm-stub.c
@@ -0,0 +1,25 @@
+/*
+ * QEMU KVM RISC-V specific function stubs
+ *
+ * Copyright (c) 2020 Huawei Technologies Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "kvm_riscv.h"
+
+void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
+{
+abort();
+}
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 9d1441952a..79c931acb4 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -37,6 +37,7 @@
 #include "hw/irq.h"
 #include "qemu/log.h"
 #include "hw/loader.h"
+#include "kvm_riscv.h"
 
 static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 type, __u64 idx)
 {
@@ -440,6 +441,18 @@ int kvm_arch_handle_exit(CPUState

[PATCH RFC v5 05/12] target/riscv: Implement kvm_arch_put_registers

2021-04-11 Thread Yifei Jiang

Put GPR CSR and FP registers to kvm by KVM_SET_ONE_REG ioctl

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 142 -
 1 file changed, 141 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 63485d7b65..9d1441952a 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -85,6 +85,31 @@ static int kvm_riscv_get_regs_core(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+reg = env->pc;
+ret = kvm_set_one_reg(cs, RISCV_CORE_REG(env, regs.pc), ®);
+if (ret) {
+return ret;
+}
+
+for (i = 1; i < 32; i++) {
+__u64 id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CORE, i);
+reg = env->gpr[i];
+ret = kvm_set_one_reg(cs, id, ®);
+if (ret) {
+return ret;
+}
+}
+
+return ret;
+}
+
 static int kvm_riscv_get_regs_csr(CPUState *cs)
 {
 int ret = 0;
@@ -148,6 +173,70 @@ static int kvm_riscv_get_regs_csr(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_csr(CPUState *cs)
+{
+int ret = 0;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+reg = env->mstatus;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, sstatus), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->mie;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, sie), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->stvec;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, stvec), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sscratch;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, sscratch), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sepc;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, sepc), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->scause;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, scause), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sbadaddr;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, stval), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->mip;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, sip), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->satp;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(env, satp), ®);
+if (ret) {
+return ret;
+}
+
+return ret;
+}
+
+
 static int kvm_riscv_get_regs_fp(CPUState *cs)
 {
 int ret = 0;
@@ -181,6 +270,40 @@ static int kvm_riscv_get_regs_fp(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_fp(CPUState *cs)
+{
+int ret = 0;
+int i;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (riscv_has_ext(env, RVD)) {
+uint64_t reg;
+for (i = 0; i < 32; i++) {
+reg = env->fpr[i];
+ret = kvm_set_one_reg(cs, RISCV_FP_D_REG(env, i), ®);
+if (ret) {
+return ret;
+}
+}
+return ret;
+}
+
+if (riscv_has_ext(env, RVF)) {
+uint32_t reg;
+for (i = 0; i < 32; i++) {
+reg = env->fpr[i];
+ret = kvm_set_one_reg(cs, RISCV_FP_F_REG(env, i), ®);
+if (ret) {
+return ret;
+}
+}
+return ret;
+}
+
+return ret;
+}
+
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
@@ -209,7 +332,24 @@ int kvm_arch_get_registers(CPUState *cs)
 
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
-return 0;
+int ret = 0;
+
+ret = kvm_riscv_put_regs_core(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_put_regs_csr(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_put_regs_fp(cs);
+if (ret) {
+return ret;
+}
+
+return ret;
 }
 
 int kvm_arch_release_virq_post(int virq)
-- 
2.19.1

[PATCH RFC v5 07/12] hw/riscv: PLIC update external interrupt by KVM when kvm enabled

2021-04-11 Thread Yifei Jiang

Only support supervisor external interrupt currently.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 hw/intc/sifive_plic.c| 29 -
 target/riscv/kvm-stub.c  |  5 +
 target/riscv/kvm.c   | 20 
 target/riscv/kvm_riscv.h |  1 +
 4 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
index 97a1a27a9a..2746eb7a05 100644
--- a/hw/intc/sifive_plic.c
+++ b/hw/intc/sifive_plic.c
@@ -31,6 +31,8 @@
 #include "target/riscv/cpu.h"
 #include "sysemu/sysemu.h"
 #include "migration/vmstate.h"
+#include "sysemu/kvm.h"
+#include "kvm_riscv.h"
 
 #define RISCV_DEBUG_PLIC 0
 
@@ -147,15 +149,24 @@ static void sifive_plic_update(SiFivePLICState *plic)
 continue;
 }
 int level = sifive_plic_irqs_pending(plic, addrid);
-switch (mode) {
-case PLICMode_M:
-riscv_cpu_update_mip(RISCV_CPU(cpu), MIP_MEIP, 
BOOL_TO_MASK(level));
-break;
-case PLICMode_S:
-riscv_cpu_update_mip(RISCV_CPU(cpu), MIP_SEIP, 
BOOL_TO_MASK(level));
-break;
-default:
-break;
+if (kvm_enabled()) {
+if (mode == PLICMode_M) {
+continue;
+}
+kvm_riscv_set_irq(RISCV_CPU(cpu), IRQ_S_EXT, level);
+} else {
+switch (mode) {
+case PLICMode_M:
+riscv_cpu_update_mip(RISCV_CPU(cpu),
+ MIP_MEIP, BOOL_TO_MASK(level));
+break;
+case PLICMode_S:
+riscv_cpu_update_mip(RISCV_CPU(cpu),
+ MIP_SEIP, BOOL_TO_MASK(level));
+break;
+default:
+break;
+}
 }
 }
 
diff --git a/target/riscv/kvm-stub.c b/target/riscv/kvm-stub.c
index 39b96fe3f4..4e8fc31a21 100644
--- a/target/riscv/kvm-stub.c
+++ b/target/riscv/kvm-stub.c
@@ -23,3 +23,8 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
 {
 abort();
 }
+
+void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
+{
+abort();
+}
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 79c931acb4..da63535812 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -453,6 +453,26 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
 env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
 }
 
+void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
+{
+int ret;
+unsigned virq = level ? KVM_INTERRUPT_SET : KVM_INTERRUPT_UNSET;
+
+if (irq != IRQ_S_EXT) {
+return;
+}
+
+if (!kvm_enabled()) {
+return;
+}
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_INTERRUPT, &virq);
+if (ret < 0) {
+perror("Set irq failed");
+abort();
+}
+}
+
 bool kvm_arch_cpu_check_are_resettable(void)
 {
 return true;
diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index f38c82bf59..ed281bdce0 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -20,5 +20,6 @@
 #define QEMU_KVM_RISCV_H
 
 void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
+void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
 
 #endif
-- 
2.19.1

[PATCH RFC v5 04/12] target/riscv: Implement kvm_arch_get_registers

2021-04-11 Thread Yifei Jiang

Get GPR CSR and FP registers from kvm by KVM_GET_ONE_REG ioctl.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 150 -
 1 file changed, 149 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 0d924be33f..63485d7b65 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -50,13 +50,161 @@ static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 
type, __u64 idx)
 return id;
 }
 
+#define RISCV_CORE_REG(env, name)  kvm_riscv_reg_id(env, KVM_REG_RISCV_CORE, \
+ KVM_REG_RISCV_CORE_REG(name))
+
+#define RISCV_CSR_REG(env, name)  kvm_riscv_reg_id(env, KVM_REG_RISCV_CSR, \
+ KVM_REG_RISCV_CSR_REG(name))
+
+#define RISCV_FP_F_REG(env, idx)  kvm_riscv_reg_id(env, KVM_REG_RISCV_FP_F, 
idx)
+
+#define RISCV_FP_D_REG(env, idx)  kvm_riscv_reg_id(env, KVM_REG_RISCV_FP_D, 
idx)
+
+static int kvm_riscv_get_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+ret = kvm_get_one_reg(cs, RISCV_CORE_REG(env, regs.pc), ®);
+if (ret) {
+return ret;
+}
+env->pc = reg;
+
+for (i = 1; i < 32; i++) {
+__u64 id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CORE, i);
+ret = kvm_get_one_reg(cs, id, ®);
+if (ret) {
+return ret;
+}
+env->gpr[i] = reg;
+}
+
+return ret;
+}
+
+static int kvm_riscv_get_regs_csr(CPUState *cs)
+{
+int ret = 0;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, sstatus), ®);
+if (ret) {
+return ret;
+}
+env->mstatus = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, sie), ®);
+if (ret) {
+return ret;
+}
+env->mie = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, stvec), ®);
+if (ret) {
+return ret;
+}
+env->stvec = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, sscratch), ®);
+if (ret) {
+return ret;
+}
+env->sscratch = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, sepc), ®);
+if (ret) {
+return ret;
+}
+env->sepc = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, scause), ®);
+if (ret) {
+return ret;
+}
+env->scause = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, stval), ®);
+if (ret) {
+return ret;
+}
+env->sbadaddr = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, sip), ®);
+if (ret) {
+return ret;
+}
+env->mip = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, satp), ®);
+if (ret) {
+return ret;
+}
+env->satp = reg;
+
+return ret;
+}
+
+static int kvm_riscv_get_regs_fp(CPUState *cs)
+{
+int ret = 0;
+int i;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (riscv_has_ext(env, RVD)) {
+uint64_t reg;
+for (i = 0; i < 32; i++) {
+ret = kvm_get_one_reg(cs, RISCV_FP_D_REG(env, i), ®);
+if (ret) {
+return ret;
+}
+env->fpr[i] = reg;
+}
+return ret;
+}
+
+if (riscv_has_ext(env, RVF)) {
+uint32_t reg;
+for (i = 0; i < 32; i++) {
+ret = kvm_get_one_reg(cs, RISCV_FP_F_REG(env, i), ®);
+if (ret) {
+return ret;
+}
+env->fpr[i] = reg;
+}
+return ret;
+}
+
+return ret;
+}
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
 
 int kvm_arch_get_registers(CPUState *cs)
 {
-return 0;
+int ret = 0;
+
+ret = kvm_riscv_get_regs_core(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_get_regs_csr(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_get_regs_fp(cs);
+if (ret) {
+return ret;
+}
+
+return ret;
 }
 
 int kvm_arch_put_registers(CPUState *cs, int level)
-- 
2.19.1

[PATCH RFC v5 00/12] Add riscv kvm accel support

2021-04-11 Thread Yifei Jiang

This series adds both riscv32 and riscv64 kvm support, and implements
migration based on riscv. It is based on temporarily unaccepted kvm:
https://github.com/kvm-riscv/linux (lastest version v17).

This series depends on above pending changes which haven't yet been
accepted, so this QEMU patch series is treated as RFC patches until
that dependency has been dealt with.

Several steps to use this:
1. Build emulation
$ ./configure --target-list=riscv64-softmmu
$ make -j$(nproc)

2. Build kernel
https://github.com/kvm-riscv/linux

3. Build QEMU VM
Cross built in riscv toolchain.
$ PKG_CONFIG_LIBDIR=
$ export PKG_CONFIG_SYSROOT_DIR=
$ ./configure --target-list=riscv64-softmmu --enable-kvm \
--cross-prefix=riscv64-linux-gnu- --disable-libiscsi --disable-glusterfs \
--disable-libusb --disable-usb-redir --audio-drv-list= --disable-opengl \
--disable-libxml2
$ make -j$(nproc)

4. Start emulation
$ ./qemu-system-riscv64 -M virt -m 4096M -cpu rv64,x-h=true -nographic \
-name guest=riscv-hyp,debug-threads=on \
-smp 4 \
-bios ./fw_jump.bin \
-kernel ./Image \
-drive file=./hyp.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-append "root=/dev/vda rw console=ttyS0 earlycon=sbi"

5. Start kvm-acceled QEMU VM in emulation
$ ./qemu-system-riscv64 -M virt,accel=kvm -m 1024M -cpu host -nographic \
-name guest=riscv-guset \
-smp 2 \
-bios none \
-kernel ./Image \
-drive file=./guest.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-append "root=/dev/vda rw console=ttyS0 earlycon=sbi"

Changes since RFC v4
- Rebase on QEMU v6.0.0-rc2 and kvm-riscv linux v17.
- Remove time scaling support as software solution is incomplete.
  Because it will cause unacceptable performance degradation. and
  We will post a better solution.
- Revise according to Alistair's review comments.
  - Remove compile time XLEN checks in kvm_riscv_reg_id
  - Surround TYPE_RISCV_CPU_HOST definition by CONFIG_KVM and share
it between RV32 and RV64.
  - Add kvm-stub.c for reduce unnecessary compilation checks.
  - Add riscv_setup_direct_kernel() to direct boot kernel for KVM.

Changes since RFC v3
- Rebase on QEMU v5.2.0-rc2 and kvm-riscv linux v15.
- Add time scaling support(New patches 13, 14 and 15).
- Fix the bug that guest vm can't reboot.

Changes since RFC v2
- Fix checkpatch error at target/riscv/sbi_ecall_interface.h.
- Add riscv migration support.

Changes since RFC v1
- Add separate SBI ecall interface header.
- Add riscv32 kvm accel support.

Yifei Jiang (12):
  linux-header: Update linux/kvm.h
  target/riscv: Add target/riscv/kvm.c to place the public kvm interface
  target/riscv: Implement function kvm_arch_init_vcpu
  target/riscv: Implement kvm_arch_get_registers
  target/riscv: Implement kvm_arch_put_registers
  target/riscv: Support start kernel directly by KVM
  hw/riscv: PLIC update external interrupt by KVM when kvm enabled
  target/riscv: Handle KVM_EXIT_RISCV_SBI exit
  target/riscv: Add host cpu type
  target/riscv: Add kvm_riscv_get/put_regs_timer
  target/riscv: Implement virtual time adjusting with vm state changing
  target/riscv: Support virtual time context synchronization

 hw/intc/sifive_plic.c  |  29 +-
 hw/riscv/boot.c|  11 +
 hw/riscv/virt.c|   7 +
 include/hw/riscv/boot.h|   1 +
 linux-headers/linux/kvm.h  |  97 +
 meson.build|   2 +
 target/riscv/cpu.c |  17 +
 target/riscv/cpu.h |  10 +
 target/riscv/kvm-stub.c|  30 ++
 target/riscv/kvm.c | 605 +
 target/riscv/kvm_riscv.h   |  25 ++
 target/riscv/machine.c |  14 +
 target/riscv/meson.build   |   1 +
 target/riscv/sbi_ecall_interface.h |  72 
 14 files changed, 912 insertions(+), 9 deletions(-)
 create mode 100644 target/riscv/kvm-stub.c
 create mode 100644 target/riscv/kvm.c
 create mode 100644 target/riscv/kvm_riscv.h
 create mode 100644 target/riscv/sbi_ecall_interface.h

-- 
2.19.1

[PATCH RFC v5 11/12] target/riscv: Implement virtual time adjusting with vm state changing

2021-04-11 Thread Yifei Jiang

We hope that virtual time adjusts with vm state changing. When a vm
is stopped, guest virtual time should stop counting and kvm_timer
should be stopped. When the vm is resumed, guest virtual time should
continue to count and kvm_timer should be restored.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index ec693795ce..50328c537e 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -40,6 +40,7 @@
 #include "kvm_riscv.h"
 #include "sbi_ecall_interface.h"
 #include "chardev/char-fe.h"
+#include "sysemu/runstate.h"
 
 static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 type, __u64 idx)
 {
@@ -448,6 +449,17 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu->cpu_index;
 }
 
+static void kvm_riscv_vm_state_change(void *opaque, int running, RunState 
state)
+{
+CPUState *cs = opaque;
+
+if (running) {
+kvm_riscv_put_regs_timer(cs);
+} else {
+kvm_riscv_get_regs_timer(cs);
+}
+}
+
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
@@ -460,6 +472,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 CPURISCVState *env = &cpu->env;
 __u64 id;
 
+qemu_add_vm_change_state_handler(kvm_riscv_vm_state_change, cs);
+
 id = kvm_riscv_reg_id(env, KVM_REG_RISCV_CONFIG, 
KVM_REG_RISCV_CONFIG_REG(isa));
 ret = kvm_get_one_reg(cs, id, &isa);
 if (ret) {
-- 
2.19.1

[PATCH RFC v5 08/12] target/riscv: Handle KVM_EXIT_RISCV_SBI exit

2021-04-11 Thread Yifei Jiang

Use char-fe to handle console sbi call, which implement early
console io while apply 'earlycon=sbi' into kernel parameters.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 42 -
 target/riscv/sbi_ecall_interface.h | 72 ++
 2 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/sbi_ecall_interface.h

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index da63535812..f9707157e7 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -38,6 +38,8 @@
 #include "qemu/log.h"
 #include "hw/loader.h"
 #include "kvm_riscv.h"
+#include "sbi_ecall_interface.h"
+#include "chardev/char-fe.h"
 
 static __u64 kvm_riscv_reg_id(CPURISCVState *env, __u64 type, __u64 idx)
 {
@@ -436,9 +438,47 @@ bool kvm_arch_stop_on_emulation_error(CPUState *cs)
 return true;
 }
 
+static int kvm_riscv_handle_sbi(struct kvm_run *run)
+{
+int ret = 0;
+unsigned char ch;
+switch (run->riscv_sbi.extension_id) {
+case SBI_EXT_0_1_CONSOLE_PUTCHAR:
+ch = run->riscv_sbi.args[0];
+qemu_chr_fe_write(serial_hd(0)->be, &ch, sizeof(ch));
+break;
+case SBI_EXT_0_1_CONSOLE_GETCHAR:
+ret = qemu_chr_fe_read_all(serial_hd(0)->be, &ch, sizeof(ch));
+if (ret == sizeof(ch)) {
+run->riscv_sbi.args[0] = ch;
+} else {
+run->riscv_sbi.args[0] = -1;
+}
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s: un-handled SBI EXIT, specific reasons is %lu\n",
+  __func__, run->riscv_sbi.extension_id);
+ret = -1;
+break;
+}
+return ret;
+}
+
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
-return 0;
+int ret = 0;
+switch (run->exit_reason) {
+case KVM_EXIT_RISCV_SBI:
+ret = kvm_riscv_handle_sbi(run);
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
+  __func__, run->exit_reason);
+ret = -1;
+break;
+}
+return ret;
 }
 
 void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
diff --git a/target/riscv/sbi_ecall_interface.h 
b/target/riscv/sbi_ecall_interface.h
new file mode 100644
index 00..fb1a3fa8f2
--- /dev/null
+++ b/target/riscv/sbi_ecall_interface.h
@@ -0,0 +1,72 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *   Anup Patel 
+ */
+
+#ifndef __SBI_ECALL_INTERFACE_H__
+#define __SBI_ECALL_INTERFACE_H__
+
+/* clang-format off */
+
+/* SBI Extension IDs */
+#define SBI_EXT_0_1_SET_TIMER   0x0
+#define SBI_EXT_0_1_CONSOLE_PUTCHAR 0x1
+#define SBI_EXT_0_1_CONSOLE_GETCHAR 0x2
+#define SBI_EXT_0_1_CLEAR_IPI   0x3
+#define SBI_EXT_0_1_SEND_IPI0x4
+#define SBI_EXT_0_1_REMOTE_FENCE_I  0x5
+#define SBI_EXT_0_1_REMOTE_SFENCE_VMA   0x6
+#define SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID 0x7
+#define SBI_EXT_0_1_SHUTDOWN0x8
+#define SBI_EXT_BASE0x10
+#define SBI_EXT_TIME0x54494D45
+#define SBI_EXT_IPI 0x735049
+#define SBI_EXT_RFENCE  0x52464E43
+#define SBI_EXT_HSM 0x48534D
+
+/* SBI function IDs for BASE extension*/
+#define SBI_EXT_BASE_GET_SPEC_VERSION   0x0
+#define SBI_EXT_BASE_GET_IMP_ID 0x1
+#define SBI_EXT_BASE_GET_IMP_VERSION0x2
+#define SBI_EXT_BASE_PROBE_EXT  0x3
+#define SBI_EXT_BASE_GET_MVENDORID  0x4
+#define SBI_EXT_BASE_GET_MARCHID0x5
+#define SBI_EXT_BASE_GET_MIMPID 0x6
+
+/* SBI function IDs for TIME extension*/
+#define SBI_EXT_TIME_SET_TIMER  0x0
+
+/* SBI function IDs for IPI extension*/
+#define SBI_EXT_IPI_SEND_IPI0x0
+
+/* SBI function IDs for RFENCE extension*/
+#define SBI_EXT_RFENCE_REMOTE_FENCE_I   0x0
+#define SBI_EXT_RFENCE_REMOTE_SFENCE_VMA0x1
+#define SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID  0x2
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA   0x3
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID 0x4
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA   0x5
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID 0x6
+
+/* SBI function IDs for HSM extension */
+#define SBI_EXT_HSM_HART_START  0x0
+#define SBI_EXT_HSM_HART_STOP   0x1
+#define SBI_EXT_HSM_HART_GET_STATUS 0x2
+
+#define SBI_HSM_HART_STATUS_STARTED 0x0
+#define SBI_HSM_HART_STATUS_STOPPED 0x1
+#define SBI_HSM_HART_STATUS_START_PENDING   0x2
+#define SBI_HSM_HART_STATUS_STOP_PENDING0x3
+
+#define SBI_SPEC_VERSION_MAJOR_OFFSET   24
+#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
+#define SBI_SPEC_VERSION_MINOR_MASK 0xff
+#define SBI_EXT_VENDOR_START0x0900
+#define SBI_EXT_VENDOR_END  0x09FF
+/* clang-format on */
+
+#endif
-- 
2.19.1

69 matches

Mail list logo