date:20250414

Re: [PATCH v3 2/7] arm/mpu: Provide access to the MPU region from the C code

2025-04-14 Thread Julien Grall


Hi Luca,

On 15/04/2025 00:07, Luca Fancellu wrote:

HI Julien,


On 14 Apr 2025, at 12:41, Julien Grall  wrote:

Hi Luca,

On 11/04/2025 23:56, Luca Fancellu wrote:

Implement some utility function in order to access the MPU regions
from the C world.
Signed-off-by: Luca Fancellu 
---
v3 changes:
  - Moved PRBAR0_EL2/PRLAR0_EL2 to arm64 specific
  - Modified prepare_selector() to be easily made a NOP
for Arm32, which can address up to 32 region without
changing selector and it is also its maximum amount
of MPU regions.
---
---
  xen/arch/arm/include/asm/arm64/mpu.h |   7 ++
  xen/arch/arm/include/asm/mpu.h   |   1 +
  xen/arch/arm/include/asm/mpu/mm.h|  24 +
  xen/arch/arm/mpu/mm.c| 125 +++
  4 files changed, 157 insertions(+)
diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index 4d2bd7d7877f..b4e1ecdf741d 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -8,6 +8,13 @@
#ifndef __ASSEMBLY__
  +/*
+ * The following are needed for the case generators GENERATE_WRITE_PR_REG_CASE
+ * and GENERATE_READ_PR_REG_CASE with num==0
+ */
+#define PRBAR0_EL2 PRBAR_EL2
+#define PRLAR0_EL2 PRLAR_EL2


Rather than aliasing, shouldn't we just rename PR{B,L}AR_EL2 to PR{B,L}AR0_EL2? 
This would the code mixing between the two.


PR{B,L}AR0_ELx does not exists really, the PR{B,L}AR_ELx exists for n=1..15, 
here I’m only using this “alias” for the generator,
but PR{B,L}AR_EL2 are the real register.


In this case, can PR{B,L}AR0_EL2 defined in mm.c so they are not used 
anywhere else?







+
  /* Protection Region Base Address Register */
  typedef union {
  struct __packed {
diff --git a/xen/arch/arm/include/asm/mpu.h b/xen/arch/arm/include/asm/mpu.h
index e148c705b82c..59ff22c804c1 100644
--- a/xen/arch/arm/include/asm/mpu.h
+++ b/xen/arch/arm/include/asm/mpu.h
@@ -13,6 +13,7 @@
  #define MPU_REGION_SHIFT  6
  #define MPU_REGION_ALIGN  (_AC(1, UL) << MPU_REGION_SHIFT)
  #define MPU_REGION_MASK   (~(MPU_REGION_ALIGN - 1))
+#define MPU_REGION_RES0   (0xFFFULL << 52)
#define NUM_MPU_REGIONS_SHIFT   8
  #define NUM_MPU_REGIONS (_AC(1, UL) << NUM_MPU_REGIONS_SHIFT)
diff --git a/xen/arch/arm/include/asm/mpu/mm.h 
b/xen/arch/arm/include/asm/mpu/mm.h
index 86f33d9836b7..5cabe9d111ce 100644
--- a/xen/arch/arm/include/asm/mpu/mm.h
+++ b/xen/arch/arm/include/asm/mpu/mm.h
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
extern struct page_info *frame_table;
  @@ -29,6 +30,29 @@ static inline struct page_info *virt_to_page(const void *v)
  return mfn_to_page(mfn);
  }
  +/* Utility function to be used whenever MPU regions are modified */
+static inline void context_sync_mpu(void)
+{
+/*
+ * ARM DDI 0600B.a, C1.7.1
+ * Writes to MPU registers are only guaranteed to be visible following a
+ * Context synchronization event and DSB operation.


I know we discussed about this before. I find odd that the specification says "context 
synchronization event and DSB operation". At least to me, it implies "isb + dsb" not 
the other way around. Has this been clarified in newer version of the specification?


unfortunately no, I’m looking into the latest one (Arm® Architecture Reference 
Manual Supplement Armv8, for R-profile AArch64 architecture 0600B.a) but it has 
the same wording, however
I spoke internally with Cortex-R architects and they told me to use DSB+ISB


So you didn't speak with the ArmV8-R architects? Asking because we are 
writing code for ArmV8-R (so not only Cortex-R).


In any case, I still think this is something that needs to be clarified
in the specification. So people that don't have access to the Arm 
internal architects know the correct sequence. Is this something you can 
follow-up on?


Cheers,

--
Julien Grall

Re: [PATCH v7 2/3] xen/arm32: Create the same boot-time MPU regions as arm64

2025-04-14 Thread Orzel, Michal




On 14/04/2025 18:45, Ayan Kumar Halder wrote:
> Create Boot-time MPU protection regions (similar to Armv8-R AArch64) for
> Armv8-R AArch32.
> Also, defined *_PRBAR macros for arm32. The only difference from arm64 is that
> XN is 1-bit for arm32.
> Define the system registers and macros in mpu/cpregs.h.
> 
> Introduce WRITE_SYSREG_ASM() to write to system registers in assembly.
> 
> Signed-off-by: Ayan Kumar Halder 
> Reviewed-by: Luca Fancellu 
> Tested-by: Luca Fancellu 
> ---
> Changes from
> 
> v1 -
> 
> 1. enable_mpu() now sets HMAIR{0,1} registers. This is similar to what is
> being done in enable_mmu(). All the mm related configurations happen in this
> function.
> 
> 2. Fixed some typos. 
> 
> v2 -
> 1. Include the common prepare_xen_region.inc in head.S.
> 
> 2. Define LOAD_SYSREG()/STORE_SYSREG() for arm32.
> 
> v3 -
> 1. Rename STORE_SYSREG() as WRITE_SYSREG_ASM()
> 
> 2. enable_boot_cpu_mm() is defined in head.S
> 
> v4 -
> 1. *_PRBAR is moved to arm32/sysregs.h.
> 
> 2. MPU specific CP15 system registers are defined in mpu/cpregs.h. 
> 
> v5 -
> 1. WRITE_SYSREG_ASM is enclosed within #ifdef __ASSEMBLY__
> 
> 2. enable_mpu() clobbers r0 only.
> 
> 3. Definitions in mpu/cpregs.h in enclosed within ARM_32.
> 
> 4. Removed some #ifdefs and style changes.
> 
> v6 -
> 1. Coding style issues.
> 
> 2. Kept Luca's R-b and T-b as the changes should not impact the behavior.
Note for the future: Especially for T-b, it's better to drop the tags because
the series has not been tested in its current shape.

Reviewed-by: Michal Orzel 

~Michal

Re: [PATCH v7 0/3] Enable early bootup of Armv8-R AArch32 systems

2025-04-14 Thread Jan Beulich

On 14.04.2025 18:45, Ayan Kumar Halder wrote:
> Enable early booting of Armv8-R AArch32 based systems.
> 
> Added Luca's R-b in all the patches.
> Added Michal's R-b in patch 1 and 3.
> 
> Ayan Kumar Halder (3):
>   xen/arm: Move some of the functions to common file
>   xen/arm32: Create the same boot-time MPU regions as arm64
>   xen/arm32: mpu: Stubs to build MPU for arm32
> 
>  xen/arch/arm/arm32/Makefile  |   1 +
>  xen/arch/arm/arm32/mpu/Makefile  |   3 +
>  xen/arch/arm/arm32/mpu/head.S| 104 +++
>  xen/arch/arm/arm32/mpu/p2m.c |  19 +
>  xen/arch/arm/arm32/mpu/smpboot.c |  26 ++
>  xen/arch/arm/arm64/mpu/head.S|  78 +
>  xen/arch/arm/include/asm/arm32/sysregs.h |  13 ++-
>  xen/arch/arm/include/asm/arm64/sysregs.h |  13 +++
>  xen/arch/arm/include/asm/cpregs.h|   2 +
>  xen/arch/arm/include/asm/mm.h|   9 +-
>  xen/arch/arm/include/asm/mmu/mm.h|   7 ++
>  xen/arch/arm/include/asm/mpu/cpregs.h|  32 +++
>  xen/arch/arm/include/asm/mpu/mm.h|   5 ++
>  xen/arch/arm/include/asm/mpu/regions.inc |  79 +
>  xen/arch/arm/mpu/Makefile|   1 +
>  xen/arch/arm/mpu/domain_page.c   |  45 ++
>  16 files changed, 350 insertions(+), 87 deletions(-)
>  create mode 100644 xen/arch/arm/arm32/mpu/Makefile
>  create mode 100644 xen/arch/arm/arm32/mpu/head.S
>  create mode 100644 xen/arch/arm/arm32/mpu/p2m.c
>  create mode 100644 xen/arch/arm/arm32/mpu/smpboot.c
>  create mode 100644 xen/arch/arm/include/asm/mpu/cpregs.h
>  create mode 100644 xen/arch/arm/include/asm/mpu/regions.inc
>  create mode 100644 xen/arch/arm/mpu/domain_page.c

Even if we have files of this name elsewhere, it would imo be nice if new ones
still used dash(es) instead of underscore(s) in their names.

Jan

Re: linux-6.15-rc2/drivers/xen/balloon.c:346: Possible int/long mixup

2025-04-14 Thread Jürgen Groß


On 14.04.25 19:57, David Binderman wrote:

Hello there,

Static analyser cppcheck says:

linux-6.15-rc2/drivers/xen/balloon.c:346:24: style: int result is assigned to 
long variable. If the variable is long to avoid loss of information, then you 
have loss of information. [truncLongCastAssignment]

Source code is

 unsigned long i, size = (1 << order);

Maybe better code:

 unsigned long i, size = (1UL << order);



While I agree this would be better, there is no real failure possible
here. For this to cause problems you'd need to hotplug 16TB of memory
in one single block.

Nevertheless thanks for the notice.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available

2025-04-14 Thread H. Peter Anvin

On April 14, 2025 10:48:47 AM PDT, Xin Li  wrote:
>On 4/12/2025 4:10 PM, H. Peter Anvin wrote:
>> Also,*in this specific case* IA32_SPEC_CTRL is architecturally 
>> nonserializing, i.e. WRMSR executes as WRMSRNS anyway.
>
>While the immediate form WRMSRNS could be faster because the MSR index
>is available *much* earlier in the pipeline, right?

Yes, but then it would be redundant with the virtualization support.

Re: [PATCH v3 12/16] x86/hyperlaunch: add domain id parsing to domain config

2025-04-14 Thread Stefano Stabellini

On Mon, 14 Apr 2025, Alejandro Vallejo wrote:
> Though I'm starting to get urges to rewrite many of this error handlers
> as asserts, on the basis that "why do we think it's ok to boot with
> malformed DTBs"? A safe system that doesn't boot is more helpful than an
> unsafe one that boots everything except a critical component for you to
> find later on.

It is totally OK to panic on boot if a malformed DTB was passed.  See
the number of panics in xen/arch/arm/dom0less-build.c.

Re: [PATCH v6 3/3] xen/arm32: mpu: Stubs to build MPU for arm32

2025-04-14 Thread Orzel, Michal




On 11/04/2025 13:04, Ayan Kumar Halder wrote:
> Add stubs to enable compilation.
> 
> is_xen_heap_page() and is_xen_heap_mfn() are not implemented for arm32 MPU.
> Thus, introduce the stubs for these functions in asm/mpu/mm.h and move the
> original code to asm/mmu/mm.h (as it is used for arm32 MMU based system).
> 
> Signed-off-by: Ayan Kumar Halder 
> Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 

~Michal

Re: [PATCH v3 6/6] CI: Include microcode for x86 hardware jobs

2025-04-14 Thread Andrew Cooper

On 14/04/2025 6:45 pm, Anthony PERARD wrote:
> On Mon, Apr 14, 2025 at 12:09:03PM +0100, Andrew Cooper wrote:
>> diff --git a/automation/gitlab-ci/build.yaml 
>> b/automation/gitlab-ci/build.yaml
>> index 1b82b359d01f..ac5367874526 100644
>> --- a/automation/gitlab-ci/build.yaml
>> +++ b/automation/gitlab-ci/build.yaml
>> @@ -306,6 +306,7 @@ alpine-3.18-gcc-debug:
>>CONFIG_ARGO=y
>>CONFIG_UBSAN=y
>>CONFIG_UBSAN_FATAL=y
>> +  CONFIG_UCODE_SCAN_DEFAULT=y
> Is there a change

DYM "chance" ?

>  that this patch series gets backported? Because that
> new Kconfig option won't exist.

Yes, I do intend to backport this whole series in due course, and yes,
I'm aware.

> Othewise, patch looks fine:
> Reviewed-by: Anthony PERARD 

Thanks.

~Andrew

[PATCH v2 7/8] xen/common: dom0less: introduce common domain-build.c

2025-04-14 Thread Oleksii Kurochko

Some functions of Arm's domain_build.c could be reused by dom0less or other
features connected to domain construction/build.

The following functions are moved to common:
- get_allocation_size().
- allocate_domheap_memory().
- guest_map_pages().
- allocate_bank_memory().
- add_hwdom_free_regions().
- find_unallocated_memory().
- allocate_memory().
- dtb_load().
- initrd_load().

Prototype of dtb_load() and initrd_load() is updated to recieve a pointer
to copy_to_guest_phys() as some archs require
copy_to_guest_phys_fluch_dcache().

Update arm/include/asm/Makefile to generate  domain-build.h for Arm as it is
used by domain-build.c.

Signed-off-by: Oleksii Kurochko 
---
Change in v2:
 - Use xen/fdt-domain-build.h instead of asm/domain_build.h.
---
 xen/arch/arm/domain_build.c   | 397 +
 xen/common/device-tree/Makefile   |   1 +
 xen/common/device-tree/domain-build.c | 404 ++
 xen/include/xen/fdt-domain-build.h|  33 ++-
 4 files changed, 439 insertions(+), 396 deletions(-)
 create mode 100644 xen/common/device-tree/domain-build.c

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 75f048f58c..86fcaefa26 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -119,18 +119,6 @@ struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
 return vcpu_create(dom0, 0);
 }
 
-unsigned int __init get_allocation_size(paddr_t size)
-{
-/*
- * get_order_from_bytes returns the order greater than or equal to
- * the given size, but we need less than or equal. Adding one to
- * the size pushes an evenly aligned size into the next order, so
- * we can then unconditionally subtract 1 from the order which is
- * returned.
- */
-return get_order_from_bytes(size + 1) - 1;
-}
-
 /*
  * Insert the given pages into a memory bank, banks are ordered by address.
  *
@@ -417,98 +405,6 @@ static void __init allocate_memory_11(struct domain *d,
 }
 }
 
-bool __init allocate_domheap_memory(struct domain *d, paddr_t tot_size,
-alloc_domheap_mem_cb cb, void *extra)
-{
-unsigned int max_order = UINT_MAX;
-
-while ( tot_size > 0 )
-{
-unsigned int order = get_allocation_size(tot_size);
-struct page_info *pg;
-
-order = min(max_order, order);
-
-pg = alloc_domheap_pages(d, order, 0);
-if ( !pg )
-{
-/*
- * If we can't allocate one page, then it is unlikely to
- * succeed in the next iteration. So bail out.
- */
-if ( !order )
-return false;
-
-/*
- * If we can't allocate memory with order, then it is
- * unlikely to succeed in the next iteration.
- * Record the order - 1 to avoid re-trying.
- */
-max_order = order - 1;
-continue;
-}
-
-if ( !cb(d, pg, order, extra) )
-return false;
-
-tot_size -= (1ULL << (PAGE_SHIFT + order));
-}
-
-return true;
-}
-
-static bool __init guest_map_pages(struct domain *d, struct page_info *pg,
-   unsigned int order, void *extra)
-{
-gfn_t *sgfn = (gfn_t *)extra;
-int res;
-
-BUG_ON(!sgfn);
-res = guest_physmap_add_page(d, *sgfn, page_to_mfn(pg), order);
-if ( res )
-{
-dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res);
-return false;
-}
-
-*sgfn = gfn_add(*sgfn, 1UL << order);
-
-return true;
-}
-
-bool __init allocate_bank_memory(struct kernel_info *kinfo, gfn_t sgfn,
- paddr_t tot_size)
-{
-struct membanks *mem = kernel_info_get_mem(kinfo);
-struct domain *d = kinfo->d;
-struct membank *bank;
-
-/*
- * allocate_bank_memory can be called with a tot_size of zero for
- * the second memory bank. It is not an error and we can safely
- * avoid creating a zero-size memory bank.
- */
-if ( tot_size == 0 )
-return true;
-
-bank = &mem->bank[mem->nr_banks];
-bank->start = gfn_to_gaddr(sgfn);
-bank->size = tot_size;
-
-/*
- * Allocate pages from the heap until tot_size is zero and map them to the
- * guest using guest_map_pages, passing the starting gfn as extra parameter
- * for the map operation.
- */
-if ( !allocate_domheap_memory(d, tot_size, guest_map_pages, &sgfn) )
-return false;
-
-mem->nr_banks++;
-kinfo->unassigned_mem -= bank->size;
-
-return true;
-}
-
 /*
  * When PCI passthrough is available we want to keep the
  * "linux,pci-domain" in sync for every host bridge.
@@ -899,226 +795,6 @@ int __init add_ext_regions(unsigned long s_gfn, unsigned 
long e_gfn,
 return 0;
 }
 
-static int __init add_hwdom_free_regions(unsigned long s_gfn,
- unsigned long e_gfn, void *data)
-{
-struct membanks *free_regi

[PATCH v2 0/8] Move parts of Arm's Dom0less to common code

2025-04-14 Thread Oleksii Kurochko

Some parts of Arm's Dom0less solution could be moved to common code as they are
not truly Arm-specific.

Most of the code is moved as is, with only minor changes introduced to provide
abstractions that hide Arm-specific details, while maintaining functional
equivalence with original Arm's code.

There are several open questions:
1. Probably, the introduced headers currently placed in asm-generic should
   instead reside in the xen/include folder.
2. Perhaps the introduced *.c files should always be placed elsewhere. They
   have been put in device-tree common as they somewhat depend on device tree
   functionality.
3. The u64 and u32 types are widely used in the code where device tree
   functionality is implemented because these types are used in device tree
   function arguments.
   Should this be reworked to use uint32_t and uint64_t instead? If so, will it
   also be necessary to change the type of variables passed to dt-related
   functions, or should the argument types of device tree functions be updated
   too? For example:
   ```
u64 mem;
...
rc = dt_property_read_u64(node, "memory", &mem);
   ```
   where dt_property_read_u64 is declared as:
 bool dt_property_read_u64(... , u64 *out_value);
4. Instead of providing init_intc_phandle() (see the patch: [1]), perhaps it
   would be better to add a for loop in domain_handle_dtb_bootmodule()?
   Something like:
   ```
bool is_intc_phandle_inited = false;
for ( unsigned int i = 0; i < ARRAY_SIZE(intc_names_array); i++ )
{
if ( dt_node_cmp(name, intc_names_array[i]) == 0 )
{
uint32_t phandle_intc = fdt_get_phandle(pfdt, node_next);

if ( phandle_intc != 0 )
kinfo->phandle_intc = phandle_intc;

is_intc_phandle_inited = true;
break;
}
}

if ( is_intc_phandle_inited ) continue;
  ```

[1]] [PATCH v1 9/9] xen/common: dom0less: introduce common dom0less-build.c

---
Changes in v2:
- Update cover letter message.
- Rebase on top of the current staging.
- Drop patches:
   - asm-generic: move Arm's static-memory.h to asm-generic
   - asm-generic: move Arm's static-shmem.h to asm-generic
  as in the nearest future there is no real users of STATIC_MEMORY and
  STATIC_SHMEM.
- Add new cleanup patch:
  [PATCH v2 1/8] xen/arm: drop declaration of handle_device_interrupts()
- All other changes are patch specific. Please check them seprately for each
  patch
---

Oleksii Kurochko (8):
  xen/arm: drop declaration of handle_device_interrupts()
  xen/common: dom0less: make some parts of Arm's CONFIG_DOM0LESS common
  asm-generic: move parts of Arm's asm/kernel.h to common code
  arm/static-shmem.h: drop inclusion of asm/setup.h
  asm-generic: move some parts of Arm's domain_build.h to common
  xen/common: dom0less: introduce common kernel.c
  xen/common: dom0less: introduce common domain-build.c
  xen/common: dom0less: introduce common dom0less-build.c

 xen/arch/arm/Kconfig  |  10 +-
 xen/arch/arm/acpi/domain_build.c  |   4 +-
 xen/arch/arm/dom0less-build.c | 997 +++---
 xen/arch/arm/domain_build.c   | 411 +
 xen/arch/arm/include/asm/Makefile |   1 +
 xen/arch/arm/include/asm/dom0less-build.h |  32 -
 xen/arch/arm/include/asm/domain_build.h   |  31 +-
 xen/arch/arm/include/asm/kernel.h | 126 +--
 xen/arch/arm/include/asm/static-memory.h  |   2 +-
 xen/arch/arm/include/asm/static-shmem.h   |   2 +-
 xen/arch/arm/kernel.c | 234 +
 xen/arch/arm/static-memory.c  |   1 +
 xen/arch/arm/static-shmem.c   |   3 +-
 xen/common/Kconfig|  19 +
 xen/common/device-tree/Makefile   |   3 +
 xen/common/device-tree/dom0less-build.c   | 891 +++
 xen/common/device-tree/domain-build.c | 404 +
 xen/common/device-tree/dt-overlay.c   |   4 +-
 xen/common/device-tree/kernel.c   | 242 ++
 xen/include/asm-generic/dom0less-build.h  |  82 ++
 xen/include/xen/fdt-domain-build.h|  77 ++
 xen/include/xen/fdt-kernel.h  | 146 
 22 files changed, 2013 insertions(+), 1709 deletions(-)
 delete mode 100644 xen/arch/arm/include/asm/dom0less-build.h
 create mode 100644 xen/common/device-tree/dom0less-build.c
 create mode 100644 xen/common/device-tree/domain-build.c
 create mode 100644 xen/common/device-tree/kernel.c
 create mode 100644 xen/include/asm-generic/dom0less-build.h
 create mode 100644 xen/include/xen/fdt-domain-build.h
 create mode 100644 xen/include/xen/fdt-kernel.h

-- 
2.49.0

Re: [PATCH v3 11/16] x86/hyperlaunch: locate dom0 initrd with hyperlaunch

2025-04-14 Thread Alejandro Vallejo

On Mon Apr 14, 2025 at 6:06 PM BST, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 12:34 PM BST, Jan Beulich wrote:
>> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>>
>>> +printk("  ramdisk: boot module %d\n", idx);
>>> +bi->mods[idx].type = BOOTMOD_RAMDISK;
>>> +bd->module = &bi->mods[idx];
>>
>> The field's named "module" now, but that now ends up inconsistent with
>> naming used elsewhere, as is pretty noticeable here.
>
> Well, yes. It is confusing. Also, the DTB is called multiboot,ramdisk,
> because multiboot,module is already used to detect what nodes are
> expressed as multiboot,modules. I'm considering going back and calling
> them ramdisk again. If anything, to avoid the ambiguity between
> domain modules and multiboot modules. e.g: a kernel is a multiboot
> module, but not a domain module.

Particularly when misc/arm/device-tree/booting.txt already states that
the initrd for dom0 ought to be provided with the "multiboot,ramdisk"
string in the "compatible" prop.  Deviating from that is just going to
make it far more annoying to unify arm and x86 in the future.  And
calling those ramdisks anything but ramdisk internally is just plain
confusing (as evidenced in the current series).

So... how frontally opposed would you be to restoring the ramdisk
nomenclature? Also, for ease of rebasing future patches it'd be far
nicer to go back to ramdisk rather than reinventing some new name.

I'm for the time being leaving things as they are (because it is a pain
to change these things) until we settle on something.

Cheers,
Alejandro

Re: [PATCH v3 09/16] x86/hyperlaunch: locate dom0 kernel with hyperlaunch

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 10:24 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Look for a subnode of type `multiboot,kernel` within a domain node. If
>> found, locate it using the multiboot module helper to generically ensure
>> it lives in the module list. If the bootargs property is present and
>> there was not an MB1 string, then use the command line from the device
>> tree definition.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Signed-off-by: Jason Andryuk jason.andr...@amd.com
>> 
>> Signed-off-by: Alejandro Vallejo agarc...@amd.com
>> 
>> ---
>> v3:
>> * Add const to fdt
>> * Remove idx == NULL checks
>> * Add BUILD_BUG_ON for MAX_NR_BOOTMODS fitting in a uint32_t
>> * Remove trailing ) from printks
>> * Return ENODATA for missing kernel
>> * Re-work "max domains" warning and print limit
>> * fdt_cell_as_u32/directly return values
>> * Remove "pairs" looping from fdt_get_reg_prop() and only grab 1.
>> * Use addr_cells and size_cells
>> ---
>> xen/arch/x86/domain-builder/core.c | 11 ++
>> xen/arch/x86/domain-builder/fdt.c | 57 ++
>> xen/arch/x86/setup.c | 5 ---
>> 3 files changed, 68 insertions(+), 5 deletions(-)
>> 
>> diff --git a/xen/arch/x86/domain-builder/core.c 
>> b/xen/arch/x86/domain-builder/core.c
>> index c50eff34fb..eda7fa7a8f 100644
>> --- a/xen/arch/x86/domain-builder/core.c
>> +++ b/xen/arch/x86/domain-builder/core.c
>> @@ -59,6 +59,17 @@ void __init builder_init(struct boot_info *bi)
>> 
>> printk(XENLOG_INFO " Number of domains: %d\n", bi->nr_domains);
>> 
>> }
>> + else
>> + {
>> + unsigned int i;
>> +
>> + /* Find first unknown boot module to use as Dom0 kernel */
>> + printk("Falling back to using first boot module as dom0\n");
>> + i = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
>> + bi->mods[i].type = BOOTMOD_KERNEL;
>> 
>> + bi->domains[0].kernel = &bi->mods[i];
>> 
>> + bi->nr_domains = 1;
>> 
>> + }
>> }
>> 
>> /*
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index 9ebc8fd0e4..a037c8b6cb 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -155,6 +155,52 @@ int __init fdt_read_multiboot_module(const void *fdt, 
>> int node,
>> return idx;
>> }
>> 
>> +static int __init process_domain_node(
>> + struct boot_info *bi, const void *fdt, int dom_node)
>> +{
>> + int node;
>> + struct boot_domain *bd = &bi->domains[bi->nr_domains];
>> 
>> + const char *name = fdt_get_name(fdt, dom_node, NULL) ?: "unknown";
>> + int address_cells = fdt_address_cells(fdt, dom_node);
>> + int size_cells = fdt_size_cells(fdt, dom_node);
>> +
>> + fdt_for_each_subnode(node, fdt, dom_node)
>> + {
>> + if ( fdt_node_check_compatible(fdt, node, "multiboot,kernel") == 0 )
>> + {
>> + int idx;
>> +
>> + if ( bd->kernel )
>> 
>> + {
>> + printk(XENLOG_ERR "Duplicate kernel module for domain %s\n",
>
> Looks like it should be XENLOG_WARNING since the loop continues.

Fair point.

>
> Also, I would use either Capitalized or lower case messages everywhere
> for consistency.

That's related to those leading spaces. The lowercases end up
immediately under the configuration message so it's easier to bind them
visually as "hyperlaunch-related".

(XEN) Hyperlaunch configuration:
(XEN)   something
(XEN)   failed processing kernel module for domain %s

>
>> + name);
>> + continue;
>> + }
>> +
>> + idx = fdt_read_multiboot_module(fdt, node, address_cells,
>> + size_cells, bi);
>> + if ( idx < 0 )
>> + {
>> + printk(" failed processing kernel module for domain %s\n",
>
> I think this printout should have XENLOG_ERR in it since it's on the
> error code path.

All of these should have a XENLOG_X so they can be skipped when _INFO
is itself filtered out.

>
>> + name);
>> + return idx;
>> + }
>> +
>> + printk(" kernel: boot module %d\n", idx);
>> + bi->mods[idx].type = BOOTMOD_KERNEL;
>> 
>> + bd->kernel = &bi->mods[idx];
>> 
>> + }
>> + }
>> +
>> + if ( !bd->kernel )
>> 
>> + {
>> + printk(XENLOG_ERR "ERR: no kernel assigned to domain\n");
>
> Drop "ERR" since it is already XENLOG_ERR level?

ERR: is printed though, whereas XENLOG_ERR is not. That's meant to make
it visually clear that's _really_ not meant to happen.

>
>> + return -ENODATA;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int __init find_hyperlaunch_node(const void *fdt)
>> {
>> int hv_node = fdt_path_offset(fdt, "/chosen/hypervisor");
>> @@ -217,9 +263,20 @@ int __init walk_hyperlaunch_fdt(struct boot_info *bi)
>> 
>> fdt_for_each_subnode(node, fdt, hv_node)
>> {
>> + if ( bi->nr_domains >= MAX_NR_BOOTDOMS )
>> 
>> + {
>> + printk(XENLOG_WARNING
>> + "WARN: only creating first %u domains\n", MAX_NR_BOOTDOMS);
>
> Drop "WARN" since it is already XENLOG_WARNING level?

Same rationale as above.

>
>> + break;
>> + }
>> +
>> ret = fdt_node_check_compatible(fdt, node, "xen,

Re: [PATCH v3 5/6] CI: save toolstack artifact as cpio.gz

2025-04-14 Thread Anthony PERARD

On Mon, Apr 14, 2025 at 12:09:02PM +0100, Andrew Cooper wrote:
> From: Marek Marczykowski-Górecki 
> 
> This avoids the need to re-compress it in every test job.  This saves minutes
> of wallclock time.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> Reviewed-by: Andrew Cooper 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available

2025-04-14 Thread Xin Li


On 4/12/2025 4:10 PM, H. Peter Anvin wrote:

Also,*in this specific case* IA32_SPEC_CTRL is architecturally nonserializing, 
i.e. WRMSR executes as WRMSRNS anyway.


While the immediate form WRMSRNS could be faster because the MSR index
is available *much* earlier in the pipeline, right?

Re: [PATCH v3 14/16] x86/hyperlaunch: add memory parsing to domain config

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:29 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Add three properties, memory, mem-min, and mem-max, to the domain node device
>> tree parsing to define the memory allocation for a domain. All three fields 
>> are
>> expressed in kb and written as a u64 in the device tree entries.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Reviewed-by: Jason Andryuk jason.andr...@amd.com
>> 
>> ---
>> xen/arch/x86/dom0_build.c | 8 ++
>> xen/arch/x86/domain-builder/fdt.c | 34 ++
>> xen/arch/x86/include/asm/boot-domain.h | 4 +++
>> xen/include/xen/libfdt/libfdt-xen.h | 10 
>> 4 files changed, 56 insertions(+)
>> 
>> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
>> index 0b467fd4a4..36fb090643 100644
>> --- a/xen/arch/x86/dom0_build.c
>> +++ b/xen/arch/x86/dom0_build.c
>> @@ -627,6 +627,14 @@ int __init construct_dom0(const struct boot_domain bd)
>> 
>> process_pending_softirqs();
>> 
>> + / If param dom0_size was not set and HL config provided memory size */
>> + if ( !get_memsize(&dom0_size, LONG_MAX) && bd->mem_pages )
>> 
>> + dom0_size.nr_pages = bd->mem_pages;
>> 
>> + if ( !get_memsize(&dom0_min_size, LONG_MAX) && bd->min_pages )
>> 
>> + dom0_size.nr_pages = bd->min_pages;
>> 
>> + if ( !get_memsize(&dom0_max_size, LONG_MAX) && bd->max_pages )
>> 
>> + dom0_size.nr_pages = bd->max_pages;
>> 
>> +
>> if ( is_hvm_domain(d) )
>> rc = dom0_construct_pvh(bd);
>> else if ( is_pv_domain(d) )
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index da65f6a5a0..338b4838c2 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -6,6 +6,7 @@
>> #include 
>> 
>> #include 
>> 
>> #include 
>> 
>> +#include 
>> 
>> 
>> #include 
>> 
>> #include 
>> 
>> @@ -212,6 +213,39 @@ static int __init process_domain_node(
>> else
>> printk("PV\n");
>> }
>> + else if ( strncmp(prop_name, "memory", name_len) == 0 )
>> + {
>> + uint64_t kb;
>> + if ( fdt_prop_as_u64(prop, &kb) != 0 )
>> + {
>> + printk(" failed processing memory for domain %s\n", name);
>> + return -EINVAL;
>> + }
>> + bd->mem_pages = PFN_DOWN(kb * SZ_1K);
>
> Perhaps use shorter form of KB(kb) (KB() from include/xen/config.h)?
>
> What do you think?

Sure.

Cheers,
Alejandro

Re: [PATCH v3 14/16] x86/hyperlaunch: add memory parsing to domain config

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 1:03 PM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> @@ -212,6 +213,39 @@ static int __init process_domain_node(
>>  else
>>  printk("PV\n");
>>  }
>> +else if ( strncmp(prop_name, "memory", name_len) == 0 )
>> +{
>> +uint64_t kb;
>> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
>
> Nit (you know what I have to say here, and again below.)

Ack

>
>> +{
>> +printk("  failed processing memory for domain %s\n", name);
>> +return -EINVAL;
>
> Any reason to override fdt_prop_as_u64()'s return value here?

Mostly to avoid needing to recover the error code. I'll just do it.

>
>> +}
>> +bd->mem_pages = PFN_DOWN(kb * SZ_1K);
>> +printk("  memory: %ld kb\n", kb);
>> +}
>> +else if ( strncmp(prop_name, "mem-min", name_len) == 0 )
>> +{
>> +uint64_t kb;
>> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
>> +{
>> +printk("  failed processing memory for domain %s\n", name);
>> +return -EINVAL;
>> +}
>> +bd->min_pages = PFN_DOWN(kb * SZ_1K);
>> +printk("  min memory: %ld kb\n", kb);
>> +}
>> +else if ( strncmp(prop_name, "mem-max", name_len) == 0 )
>> +{
>> +uint64_t kb;
>> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
>> +{
>> +printk("  failed processing memory for domain %s\n", name);
>
> All three error messages being identical doesn't help diagnosing issues.

Indeed. Will add the prop that trigger each.

>
>> --- a/xen/include/xen/libfdt/libfdt-xen.h
>> +++ b/xen/include/xen/libfdt/libfdt-xen.h
>> @@ -34,6 +34,16 @@ static inline int __init fdt_prop_as_u32(
>>  return 0;
>>  }
>>  
>> +static inline int __init fdt_prop_as_u64(
>> +const struct fdt_property *prop, uint64_t *val)
>> +{
>> +if ( !prop || fdt32_to_cpu(prop->len) < sizeof(u64) )
>> +return -EINVAL;
>> +
>> +*val = fdt_cell_as_u64((fdt32_t *)prop->data);
>
> Please avoid casting away const. Looks like I overlooked this in
> fdt_prop_as_u32() that was introduced by an earlier patch.

As part of v4 I moved this and fdt_prop_as_u32() earlier to patch8 and
already adjusted accordingly.

Cheers,
Alejandro

Re: [PATCH v3 15/16] x86/hyperlaunch: add max vcpu parsing of hyperlaunch device tree

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:33 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Introduce the `cpus` property, named as such for dom0less compatibility, that
>> represents the maximum number of vpcus to allocate for a domain. In the 
>> device
>> tree, it will be encoded as a u32 value.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Reviewed-by: Jason Andryuk jason.andr...@amd.com
>> 
>> ---
>> xen/arch/x86/dom0_build.c | 3 +++
>> xen/arch/x86/domain-builder/fdt.c | 11 +++
>> xen/arch/x86/include/asm/boot-domain.h | 2 ++
>> 3 files changed, 16 insertions(+)
>> 
>> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
>> index 36fb090643..7b3e31a08f 100644
>> --- a/xen/arch/x86/dom0_build.c
>> +++ b/xen/arch/x86/dom0_build.c
>> @@ -635,6 +635,9 @@ int __init construct_dom0(const struct boot_domain *bd)
>> if ( !get_memsize(&dom0_max_size, LONG_MAX) && bd->max_pages )
>> 
>> dom0_size.nr_pages = bd->max_pages;
>> 
>> 
>> + if ( opt_dom0_max_vcpus_max == UINT_MAX && bd->max_vcpus )
>> 
>> + opt_dom0_max_vcpus_max = bd->max_vcpus;
>> 
>> +
>> if ( is_hvm_domain(d) )
>> rc = dom0_construct_pvh(bd);
>> else if ( is_pv_domain(d) )
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index 338b4838c2..5fcb767bdd 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -246,6 +246,17 @@ static int __init process_domain_node(
>> bd->max_pages = PFN_DOWN(kb * SZ_1K);
>> 
>> printk(" max memory: %ld kb\n", kb);
>> }
>> + else if ( strncmp(prop_name, "cpus", name_len) == 0 )
>> + {
>> + uint32_t val = UINT_MAX;
>> + if ( fdt_prop_as_u32(prop, &val) != 0 )
>> + {
>> + printk(" failed processing max_vcpus for domain %s\n", name);
>
> Suggest adding XENLOG_ERR to the error message.

And XENLOG_INFO to the one below.

Ack.

Cheers,
Alejandro

Re: [PATCH v3 08/16] x86/hyperlaunch: Add helpers to locate multiboot modules

2025-04-14 Thread Nicola Vetrini


On 2025-04-14 17:05, Jan Beulich wrote:

On 14.04.2025 15:37, Alejandro Vallejo wrote:

On Thu Apr 10, 2025 at 11:42 AM BST, Jan Beulich wrote:

On 08.04.2025 18:07, Alejandro Vallejo wrote:

+/*
+ * Locate a multiboot module given its node offset in the FDT.
+ *
+ * The module location may be given via either FDT property:
+ * * reg = 
+ * * Mutates `bi` to append the module.
+ * * module-index = 
+ * * Leaves `bi` unchanged.
+ *
+ * @param fdt   Pointer to the full FDT.
+ * @param node  Offset for the module node.
+ * @param address_cells Number of 4-octet cells that make up an 
"address".
+ * @param size_cellsNumber of 4-octet cells that make up a 
"size".
+ * @param bi[inout] Xen's representation of the boot 
parameters.

+ * @return  -EINVAL on malformed nodes, otherwise
+ *  index inside `bi->mods`
+ */
+int __init fdt_read_multiboot_module(const void *fdt, int node,
+ int address_cells, int 
size_cells,

+ struct boot_info *bi)


Functions without callers and non-static ones without declarations 
are

disliked by Misra.


Can't do much about it if I want them to stand alone in a single 
patch.
Otherwise the following ones become quite unwieldy to look at. All I 
can

say is that this function becomes static and with a caller on the next
patch.


Which means you need to touch this again anyway. Perhaps we need a 
Misra

deviation for __maybe_unused functions / data, in which case you could
use that here and strip it along with making the function static. 
Cc-ing

Bugseng folks.



There is already an exception for __maybe_unused on labels (Rule 2.6). 
In principle it could be easily extended to encompass unused functions 
(which are verified by another rule), with a suitable rationale.



+/* Otherwise location given as a `reg` property. */
+prop = fdt_get_property(fdt, node, "reg", NULL);
+
+if ( !prop )
+{
+printk("  No location for multiboot,module\n");
+return -EINVAL;
+}
+if ( fdt_get_property(fdt, node, "module-index", NULL) )
+{
+printk("  Location of multiboot,module defined multiple 
times\n");

+return -EINVAL;
+}
+
+ret = read_fdt_prop_as_reg(prop, address_cells, size_cells, 
&addr, &size);

+
+if ( ret < 0 )
+{
+printk("  Failed reading reg for multiboot,module\n");
+return -EINVAL;
+}
+
+idx = bi->nr_modules + 1;


This at least looks like an off-by-one. If the addition of 1 is 
really

intended, I think it needs commenting on.


Seems to be, yes. The underlying array is a bit bizarre. It's sizes as
MAX_NR_BOOTMODS + 1, with the first one being the DTB itself. I guess
the intent was to take it into account, but bi->nr_modules is
initialised to the number of multiboot modules, so it SHOULD be 
already

taking it into account.

Also, the logic for bounds checking seems... off (because of the + 1 I
mentioned before). Or at least confusing, so I've moved to using
ARRAY_SIZE(bi->mods) rather than explicitly comparing against
MAX_NR_BOOTMODS.

The array is MAX_NR_BOOTMODS + 1 in length, so it's just more 
cognitive

load than I'm comfortable with.


If I'm not mistaken the +1 is inherited from the modules array we had 
in

the past, where we wanted 1 extra slot for Xen itself. Hence before you
move to using ARRAY_SIZE() everywhere it needs to really be clear what
the +1 here is used for.


--- a/xen/include/xen/libfdt/libfdt-xen.h
+++ b/xen/include/xen/libfdt/libfdt-xen.h
@@ -13,6 +13,63 @@kkk

 #include 

+static inline int __init fdt_cell_as_u32(const fdt32_t *cell)


Why plain int here, but ...


+{
+return fdt32_to_cpu(*cell);
+}
+
+static inline uint64_t  __init fdt_cell_as_u64(const fdt32_t *cell)


... a fixed-width and unsigned type here? Question is whether the 
former

helper is really warranted.

Also nit: Stray double blank.


+{
+return ((uint64_t)fdt32_to_cpu(cell[0]) << 32) | 
fdt32_to_cpu(cell[1]);


That is - uniformly big endian?


These helpers are disappearing, so it doesn't matter. This is 
basically

an open coded:

  fdt64_to_cpu(*(const fdt64_t *)fdt32)

And, yes. DTBs are standardised as having big-endian properties, for
better or worse :/




+}


Marking such relatively generic inline functions __init is also 
somewhat

risky.


They were originally in domain-builder/fdt.c and moved here as a 
result

of a request to have them on libfdt. libfdt proved to be somewhat
annoying because it would be hard to distinguish accessors for the
flattened and the unflattened tree.

I'd personally have them in domain-builder instead, where they are 
used.

Should they be needed somewhere else, we can always fator them out
somewhere else.

Thoughts?


As long as they're needed only by domain-builder, it's probably fine to 
have

them just there.

Jan


--
Nicola Vetrini, B.Sc.
Software Engineer
BUGSENG (https://bugseng.com)
Li

[PATCH v4 3/3] drivers: Make ioapic_sbdf and hpet_sbdf contain pci_sbdf_t

2025-04-14 Thread Andrii Sultanov

From: Andrii Sultanov 

Following a similar change to amd_iommu struct, make two more structs
take pci_sbdf_t directly instead of seg and bdf separately. This lets us
drop several conversions from the latter to the former and simplifies
several comparisons and assignments.

Bloat-o-meter reports:
add/remove: 0/0 grow/shrink: 1/10 up/down: 256/-320 (-64)
Function old new   delta
_einittext 22092   22348+256
parse_ivrs_hpet  248 245  -3
amd_iommu_detect_one_acpi876 868  -8
iov_supports_xt  275 264 -11
amd_iommu_read_ioapic_from_ire   344 332 -12
amd_setup_hpet_msi   237 224 -13
amd_iommu_ioapic_update_ire  575 555 -20
reserve_unity_map_for_device 453 424 -29
_hvm_dpci_msi_eoi160 128 -32
amd_iommu_get_supported_ivhd_type 86  30 -56
parse_ivrs_table39663830-136

Signed-off-by: Andrii Sultanov 

---
Changes in V4:
* Folded several separate seg+bdf comparisons and assignments into one
  with sbdf_t
* With reshuffling in the prior commits, this commit is no longer
  neutral in terms of code size

Changes in V3:
* Dropped aliasing of seg and bdf, renamed users.

Changes in V2:
* Split single commit into several patches
* Change the format specifier to %pp in amd_iommu_ioapic_update_ire
---
 xen/drivers/passthrough/amd/iommu.h  |  5 +--
 xen/drivers/passthrough/amd/iommu_acpi.c | 30 +++-
 xen/drivers/passthrough/amd/iommu_intr.c | 44 +++-
 3 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu.h 
b/xen/drivers/passthrough/amd/iommu.h
index 2599800e6a..52f748310b 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -262,7 +262,7 @@ int cf_check amd_setup_hpet_msi(struct msi_desc *msi_desc);
 void cf_check amd_iommu_dump_intremap_tables(unsigned char key);
 
 extern struct ioapic_sbdf {
-u16 bdf, seg;
+pci_sbdf_t sbdf;
 u8 id;
 bool cmdline;
 u16 *pin_2_idx;
@@ -273,7 +273,8 @@ unsigned int ioapic_id_to_index(unsigned int apic_id);
 unsigned int get_next_ioapic_sbdf_index(void);
 
 extern struct hpet_sbdf {
-u16 bdf, seg, id;
+pci_sbdf_t sbdf;
+uint16_t id;
 enum {
 HPET_NONE,
 HPET_CMDL,
diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c 
b/xen/drivers/passthrough/amd/iommu_acpi.c
index 9e4fbee953..14845766e6 100644
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -707,8 +707,7 @@ static int __init cf_check parse_ivrs_ioapic(const char 
*str)
 }
 }
 
-ioapic_sbdf[idx].bdf = PCI_BDF(bus, dev, func);
-ioapic_sbdf[idx].seg = seg;
+ioapic_sbdf[idx].sbdf = PCI_SBDF( seg, PCI_BDF(bus, dev, func) );
 ioapic_sbdf[idx].id = id;
 ioapic_sbdf[idx].cmdline = true;
 
@@ -734,8 +733,7 @@ static int __init cf_check parse_ivrs_hpet(const char *str)
 return -EINVAL;
 
 hpet_sbdf.id = id;
-hpet_sbdf.bdf = PCI_BDF(bus, dev, func);
-hpet_sbdf.seg = seg;
+hpet_sbdf.sbdf = PCI_SBDF( seg, PCI_BDF(bus, dev, func) );
 hpet_sbdf.init = HPET_CMDL;
 
 return 0;
@@ -748,6 +746,7 @@ static u16 __init parse_ivhd_device_special(
 {
 u16 dev_length, bdf;
 unsigned int apic, idx;
+pci_sbdf_t sbdf;
 
 dev_length = sizeof(*special);
 if ( header_length < (block_length + dev_length) )
@@ -757,6 +756,7 @@ static u16 __init parse_ivhd_device_special(
 }
 
 bdf = special->used_id;
+sbdf = PCI_SBDF(seg, bdf);
 if ( bdf >= ivrs_bdf_entries )
 {
 AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
@@ -764,7 +764,7 @@ static u16 __init parse_ivhd_device_special(
 }
 
 AMD_IOMMU_DEBUG("IVHD Special: %pp variety %#x handle %#x\n",
-&PCI_SBDF(seg, bdf), special->variety, special->handle);
+&sbdf, special->variety, special->handle);
 add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, 0, true,
iommu);
 
@@ -780,8 +780,7 @@ static u16 __init parse_ivhd_device_special(
  */
 for ( idx = 0; idx < nr_ioapic_sbdf; idx++ )
 {
-if ( ioapic_sbdf[idx].bdf == bdf &&
- ioapic_sbdf[idx].seg == seg &&
+if ( ioapic_sbdf[idx].sbdf.sbdf == sbdf.sbdf &&
  ioapic_sbdf[idx].cmdline )
 break;
 }
@@ -790,7 +789,7 @@ static u16 __init parse_ivhd_device_special(
 AMD_IOMMU_DEBUG("IVHD: Command line override present for IO-APIC 
%#x"
 "(IVRS: %#x devID %pp)\n",
 ioapic_sbdf[idx].id, special->handle,
-

Re: [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR

2025-04-14 Thread Francesco Lavra

On 2025-03-31 at 8:22, Xin Li (Intel) wrote:
> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
> index e672632b1cc0..6e7a9daa03d4 100644
> --- a/arch/x86/xen/xen-asm.S
> +++ b/arch/x86/xen/xen-asm.S
> @@ -399,3 +399,37 @@ SYM_CODE_END(xen_entry_SYSCALL_compat)
>   RET
>  SYM_FUNC_END(asm_xen_write_msr)
>  EXPORT_SYMBOL_GPL(asm_xen_write_msr)
> +
> +/*
> + * The prototype of the Xen C code:
> + *   struct { u64 val, bool done } xen_do_read_msr(u32 msr)
> + */
> +SYM_FUNC_START(asm_xen_read_msr)
> + ENDBR
> + FRAME_BEGIN
> + XEN_SAVE_CALLEE_REGS_FOR_MSR
> + mov %ecx, %edi  /* MSR number */
> + call xen_do_read_msr
> + test %dl, %dl   /* %dl=1, i.e., ZF=0, meaning
> successfully done */
> + XEN_RESTORE_CALLEE_REGS_FOR_MSR
> + jnz 2f
> +
> +1:   rdmsr
> + _ASM_EXTABLE_FUNC_REWIND(1b, -5, FRAME_OFFSET /
> (BITS_PER_LONG / 8))
> + shl $0x20, %rdx
> + or %rdx, %rax
> + /*
> +  * The top of the stack points directly at the return
> address;
> +  * back up by 5 bytes from the return address.
> +  */

This works only if this function has been called directly (e.g. via
`call asm_xen_write_msr`), but doesn't work with alternative call types
(like indirect calls). Not sure why one might want to use an indirect
call to invoke asm_xen_write_msr, but this creates a hidden coupling
between caller and callee.
I don't have a suggestion on how to get rid of this coupling, other
than setting ipdelta in _ASM_EXTABLE_FUNC_REWIND() to 0 and adjusting
the _ASM_EXTABLE_TYPE entries at the call sites to consider the
instruction that follows the function call (instead of the call
instruction) as the faulting instruction (which seems pretty ugly, at
least because what follows the function call could be an instruction
that might itself fault). But you may want to make this caveat explicit
in the comment.

[PATCH v4 2/3] drivers: Change find_iommu_for_device function to take pci_sbdf_t, simplify code

2025-04-14 Thread Andrii Sultanov

From: Andrii Sultanov 

Following a similar change to amd_iommu struct, change the
find_iommu_for_device function to take pci_sbdf_t as a single parameter.
This removes conversions in the majority of cases.

Bloat-o-meter reports (on top of the first patch in the series):
add/remove: 0/0 grow/shrink: 12/11 up/down: 95/-95 (0)
Function old new   delta
amd_iommu_get_supported_ivhd_type 54  86 +32
parse_ivrs_table39553966 +11
amd_iommu_assign_device  271 282 +11
__mon_lengths   29282936  +8
update_intremap_entry_from_msi_msg   859 864  +5
iov_supports_xt  270 275  +5
amd_setup_hpet_msi   232 237  +5
amd_iommu_domain_destroy  46  51  +5
_hvm_dpci_msi_eoi155 160  +5
find_iommu_for_device242 246  +4
amd_iommu_ioapic_update_ire  572 575  +3
allocate_domain_resources 82  83  +1
amd_iommu_read_ioapic_from_ire   347 344  -3
reassign_device  843 838  -5
amd_iommu_remove_device  352 347  -5
amd_iommu_get_reserved_device_memory 521 516  -5
amd_iommu_flush_iotlb359 354  -5
amd_iommu_add_device 844 839  -5
amd_iommu_setup_domain_device   14781472  -6
build_info   752 744  -8
amd_iommu_detect_one_acpi886 876 -10
register_range_for_device297 281 -16
parse_ivmd_block13391312 -27

Signed-off-by: Andrii Sultanov 

Acked-by: Jan Beulich 

---
Changes in V4:
* After amendments to the previous commit which increased improvements
  there, this commit now does not improve code size anymore (but still
  simplifies code), so I've updated the bloat-o-meter report.

Changes in V3:
* Amended commit message
* As the previous patch dropped the aliasing of seg and bdf, renamed users of
  amd_iommu as appropriate.

Changes in V2:
* Split single commit into several patches
* Dropped brackets around &(iommu->sbdf) and &(sbdf)
* Dropped most of the hunk in _invalidate_all_devices - it was
  bloat-equivalent to the existing code - just convert with PCI_SBDF
  instead
* Dropped the hunk in get_intremap_requestor_id (iommu_intr.c) and
  amd_iommu_get_reserved_device_memory (iommu_map.c) as they were only
  increasing the code size.
* Kept "/* XXX */" where appropriate
* Fixed a slip-up in register_range_for_iommu_devices where iommu->sbdf
  replaced the usage of *different* seg and bdf.
---
 xen/drivers/passthrough/amd/iommu.h |  2 +-
 xen/drivers/passthrough/amd/iommu_acpi.c| 14 +-
 xen/drivers/passthrough/amd/iommu_cmd.c |  2 +-
 xen/drivers/passthrough/amd/iommu_init.c|  4 +--
 xen/drivers/passthrough/amd/iommu_intr.c| 17 ++--
 xen/drivers/passthrough/amd/iommu_map.c |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 30 ++---
 7 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu.h 
b/xen/drivers/passthrough/amd/iommu.h
index ba541f7943..2599800e6a 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -240,7 +240,7 @@ void amd_iommu_flush_intremap(struct amd_iommu *iommu, 
uint16_t bdf);
 void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
 
 /* find iommu for bdf */
-struct amd_iommu *find_iommu_for_device(int seg, int bdf);
+struct amd_iommu *find_iommu_for_device(pci_sbdf_t sbdf);
 
 /* interrupt remapping */
 bool cf_check iov_supports_xt(void);
diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c 
b/xen/drivers/passthrough/amd/iommu_acpi.c
index 025d9be40f..9e4fbee953 100644
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -239,17 +239,17 @@ static int __init register_range_for_device(
 unsigned int bdf, paddr_t base, paddr_t limit,
 bool iw, bool ir, bool exclusion)
 {
-int seg = 0; /* XXX */
-struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
+pci_sbdf_t sbdf = { .seg = 0 /* XXX */, .bdf = bdf };
+struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(sbdf.seg);
 struct amd_iommu *iommu;
 u16 req;
 int rc = 0;
 
-iommu = find_iommu_for_device(seg, bdf);
+iommu = find_iommu_for_device(sbdf);
 if ( !iommu )
 {
 AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring constrain\n",
-   &PCI_SBDF(seg, bdf));
+   &sbdf);
 return 0;
 }
 req = ivrs_mappings[bdf].dte_requestor_id;
@@ -263,9 +263,9 @@ static int __init regis

[ImageBuilder] uboot-script-gen: fix arm64 xen u-boot image generation

2025-04-14 Thread Grygorii Strashko

From: Grygorii Strashko 

The current code in generate_uboot_images() does not detect arm64 properly
and always generates ARM u-boot image. This causes Xen boot issues.

Fix it by searching for "ARM64" for AArch64 binary detection.

- mkimage -l xen.ub
Before:
Image Type:   ARM Linux Kernel Image (uncompressed)

After:
Image Type:   AArch64 Linux Kernel Image (uncompressed)

Signed-off-by: Grygorii Strashko 
---
 scripts/uboot-script-gen | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/uboot-script-gen b/scripts/uboot-script-gen
index a9f698f00fd1..c4d26caf5e0e 100755
--- a/scripts/uboot-script-gen
+++ b/scripts/uboot-script-gen
@@ -815,13 +815,13 @@ function linux_config()
 
 generate_uboot_images()
 {
-local arch=$(file -L $XEN | grep "ARM")
+local arch=$(file -L $XEN | grep -o "ARM64")
 
 if test "$arch"
 then
-arch=arm
-else
 arch=arm64
+else
+arch=arm
 fi
 
 mkimage -A $arch -T kernel -C none -a $memaddr -e $memaddr -d $XEN 
"$XEN".ub
-- 
2.34.1

[ImageBuilder] uboot-script-gen: add xen xsm policy loading support

2025-04-14 Thread Grygorii Strashko

From: Grygorii Strashko 

This patch adds Xen XSM policy loading support.

The configuration file XEN_POLICY specifies Xen hypervisor
XSM policy binary to load.

Signed-off-by: Grygorii Strashko 
---
 README.md|  2 ++
 scripts/uboot-script-gen | 33 +
 2 files changed, 35 insertions(+)

diff --git a/README.md b/README.md
index 137abef153ce..9106d2a07302 100644
--- a/README.md
+++ b/README.md
@@ -91,6 +91,8 @@ Where:
 - XEN specifies the Xen hypervisor binary to load. Note that it has to
   be a regular Xen binary, not a u-boot binary.
 
+- XEN_POLICY specifies the Xen hypervisor XSM policy binary to load.
+
 - XEN_COLORS specifies the colors (cache coloring) to be used for Xen
   and is in the format startcolor-endcolor
 
diff --git a/scripts/uboot-script-gen b/scripts/uboot-script-gen
index c4d26caf5e0e..343eba20e4d9 100755
--- a/scripts/uboot-script-gen
+++ b/scripts/uboot-script-gen
@@ -315,6 +315,15 @@ function xen_device_tree_editing()
 dt_set "/chosen" "#size-cells" "hex" "0x2"
 dt_set "/chosen" "xen,xen-bootargs" "str" "$XEN_CMD"
 
+if test "$XEN_POLICY" && test $xen_policy_addr != "-"
+then
+local node_name="xen-policy@${xen_policy_addr#0x}"
+
+dt_mknode "/chosen" "$node_name"
+dt_set "/chosen/$node_name" "compatible" "str_a" "xen,xsm-policy 
xen,multiboot-module multiboot,module"
+dt_set "/chosen/$node_name" "reg" "hex" "$(split_addr_size 
$xen_policy_addr $xen_policy_size)"
+fi
+
 if test "$DOM0_KERNEL"
 then
 local node_name="dom0@${dom0_kernel_addr#0x}"
@@ -900,6 +909,14 @@ xen_file_loading()
 kernel_addr=$memaddr
 kernel_path=$XEN
 load_file "$XEN" "host_kernel"
+
+xen_policy_addr=="-"
+if test "$XEN_POLICY"
+then
+xen_policy_addr=$memaddr
+load_file "$XEN_POLICY" "xen_policy"
+xen_policy_size=$filesize
+fi
 }
 
 linux_file_loading()
@@ -939,6 +956,22 @@ bitstream_load_and_config()
 
 create_its_file_xen()
 {
+if test "$XEN_POLICY" && test $xen_policy_addr != "-"
+then
+cat >> "$its_file" <<- EOF
+xen_policy {
+description = "Xen XSM policy binary";
+data = /incbin/("$XEN_POLICY");
+type = "kernel";
+arch = "arm64";
+os = "linux";
+compression = "none";
+load = <$xen_policy_addr>;
+$fit_algo
+};
+   EOF
+fi
+
 if test "$DOM0_KERNEL"
 then
 if test "$ramdisk_addr" != "-"
-- 
2.34.1

Re: [PATCH v6 1/3] xen/arm: Move some of the functions to common file

2025-04-14 Thread Orzel, Michal




On 11/04/2025 13:04, Ayan Kumar Halder wrote:
> regions.inc is added to hold the common earlyboot MPU regions configuration
NIT: I mentioned this a few times already. Please use imperative mood in commit 
msg.

> between arm64 and arm32.
> 
> prepare_xen_region, fail_insufficient_regions() will be used by both arm32 and
> arm64. Thus, they have been moved to regions.inc.
> 
> *_PRBAR are moved to arm64/sysregs.h.
> *_PRLAR are moved to regions.inc as they are common between arm32 and arm64.
> 
> Introduce WRITE_SYSREG_ASM to write to the system registers from regions.inc.
> 
> Signed-off-by: Ayan Kumar Halder 
> Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 

~Michal

[PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline

2025-04-14 Thread Penny Zheng

Users need to set "cpufreq=amd-cppc" in xen cmdline to enable
amd-cppc driver, which selects ACPI Collaborative Performance
and Power Control (CPPC) on supported AMD hardware to provide a
finer grained frequency control mechanism.
`verbose` option can also be included to support verbose print.

When users setting "cpufreq=amd-cppc", a new amd-cppc driver
shall be registered and used. All hooks for amd-cppc driver are missing
until commit "xen/x86: introduce a new amd cppc driver for cpufreq scaling"

Xen is not expected to support both or mixed mode (CPPC & legacy PSS, _PCT,
_PPC) operations, so only one cpufreq driver gets registerd, either amd-cppc
or legacy P-states driver, which is reflected and asserted by the incompatible
flags XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC.

Signed-off-by: Penny Zheng 
---
v1 -> v2:
- Obey to alphabetic sorting and also strict it with CONFIG_AMD
- Remove unnecessary empty comment line
- Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
- Make new switch-case code apply to Hygon CPUs too
- Change ENOSYS with EOPNOTSUPP
- Blanks around binary operator
- Change all amd_/-pstate defined values to amd_/-cppc
---
v2 -> v3
- refactor too long lines
- Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC incompatible flags
after cpufreq register registrantion
---
v3 -> v4:
- introduce XEN_PROCESSOR_PM_CPPC in xen internal header
- complement "Hygon" in log message
- remove unnecessary if()
- grow cpufreq_xen_opts[] array
---
 docs/misc/xen-command-line.pandoc |  7 +-
 xen/arch/x86/acpi/cpufreq/Makefile|  1 +
 xen/arch/x86/acpi/cpufreq/acpi.c  | 14 +++-
 xen/arch/x86/acpi/cpufreq/amd-cppc.c  | 81 +++
 xen/arch/x86/acpi/cpufreq/cpufreq.c   | 34 +-
 xen/arch/x86/platform_hypercall.c | 11 +++
 xen/drivers/cpufreq/cpufreq.c | 15 -
 xen/include/acpi/cpufreq/cpufreq.h|  6 +-
 xen/include/acpi/cpufreq/processor_perf.h |  3 +
 xen/include/public/sysctl.h   |  1 +
 10 files changed, 166 insertions(+), 7 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 89db6e83be..9ef847a336 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, 
rather than detectin
 available support.
 
 ### cpufreq
-> `= none | {{  | xen } { 
[:[powersave|performance|ondemand|userspace][,[]][,[]]] } 
[,verbose]} | dom0-kernel | hwp[:[][,verbose]]`
+> `= none | {{  | xen } { 
[:[powersave|performance|ondemand|userspace][,[]][,[]]] } 
[,verbose]} | dom0-kernel | hwp[:[][,verbose]] | amd-cppc[:[verbose]]`
 
 > Default: `xen`
 
@@ -526,7 +526,7 @@ choice of `dom0-kernel` is deprecated and not supported by 
all Dom0 kernels.
 * `` and `` are integers which represent max and min 
processor frequencies
   respectively.
 * `verbose` option can be included as a string or also as `verbose=`
-  for `xen`.  It is a boolean for `hwp`.
+  for `xen`.  It is a boolean for `hwp` and `amd-cppc`.
 * `hwp` selects Hardware-Controlled Performance States (HWP) on supported Intel
   hardware.  HWP is a Skylake+ feature which provides better CPU power
   management.  The default is disabled.  If `hwp` is selected, but hardware
@@ -534,6 +534,9 @@ choice of `dom0-kernel` is deprecated and not supported by 
all Dom0 kernels.
 * `` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables the
   processor to autonomously force physical package components into idle state.
   The default is enabled, but the option only applies when `hwp` is enabled.
+* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
+  on supported AMD hardware to provide finer grained frequency control
+  mechanism. The default is disabled.
 
 There is also support for `;`-separated fallback options:
 `cpufreq=hwp;xen,verbose`.  This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile 
b/xen/arch/x86/acpi/cpufreq/Makefile
index e7dbe434a8..a2ba34bda0 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_INTEL) += acpi.o
+obj-$(CONFIG_AMD) += amd-cppc.o
 obj-y += cpufreq.o
 obj-$(CONFIG_INTEL) += hwp.o
 obj-$(CONFIG_AMD) += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/acpi.c b/xen/arch/x86/acpi/cpufreq/acpi.c
index 0c25376406..e0cea9425f 100644
--- a/xen/arch/x86/acpi/cpufreq/acpi.c
+++ b/xen/arch/x86/acpi/cpufreq/acpi.c
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -514,5 +515,16 @@ acpi_cpufreq_driver = {
 
 int __init acpi_cpufreq_register(void)
 {
-return cpufreq_register_driver(&acpi_cpufreq_driver);
+int ret;
+
+ret = cpufreq_register_driver(&acpi_cpufreq_driver);
+if ( ret )
+return ret;
+/*
+ * After cpufreq driver registe

[PATCH v4 02/15] xen/cpufreq: extract _PSD info from "struct xen_processor_performance"

2025-04-14 Thread Penny Zheng

Since we need to re-use _PSD info, containing "shared_type" and
"struct xen_psd_package", for CPPC mode, we move all
"#define XEN_CPUPERF_SHARED_TYPE_xxx" up as common values, and introduce
a new helper check_psd_pminfo() to wrap _PSD info check.

In cpufreq_add/del_cpu(), a new helper get_psd_info() is introduced to
extract "shared_type" and "struct xen_psd_package" from
"struct xen_processor_performance", and a few indentation get fixed at
the same time.

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- new commit
---
 xen/drivers/cpufreq/cpufreq.c | 107 --
 xen/include/public/platform.h |  10 ++--
 2 files changed, 82 insertions(+), 35 deletions(-)

diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index b01ed8e294..b020ccbcf7 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -191,9 +191,31 @@ int cpufreq_limit_change(unsigned int cpu)
 return __cpufreq_set_policy(data, &policy);
 }
 
-int cpufreq_add_cpu(unsigned int cpu)
+static int get_psd_info(uint32_t init, unsigned int cpu,
+uint32_t *shared_type,
+struct xen_psd_package *domain_info)
 {
 int ret = 0;
+
+switch ( init )
+{
+case XEN_PX_INIT:
+if ( shared_type )
+*shared_type = processor_pminfo[cpu]->perf.shared_type;
+if ( domain_info )
+*domain_info = processor_pminfo[cpu]->perf.domain_info;
+break;
+default:
+ret = -EINVAL;
+break;
+}
+
+return ret;
+}
+
+int cpufreq_add_cpu(unsigned int cpu)
+{
+int ret;
 unsigned int firstcpu;
 unsigned int dom, domexist = 0;
 unsigned int hw_all = 0;
@@ -201,14 +223,13 @@ int cpufreq_add_cpu(unsigned int cpu)
 struct cpufreq_dom *cpufreq_dom = NULL;
 struct cpufreq_policy new_policy;
 struct cpufreq_policy *policy;
-struct processor_performance *perf;
+struct xen_psd_package domain_info;
+uint32_t shared_type;
 
 /* to protect the case when Px was not controlled by xen */
 if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
 return -EINVAL;
 
-perf = &processor_pminfo[cpu]->perf;
-
 if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
 return -EINVAL;
 
@@ -218,10 +239,15 @@ int cpufreq_add_cpu(unsigned int cpu)
 if (per_cpu(cpufreq_cpu_policy, cpu))
 return 0;
 
-if (perf->shared_type == CPUFREQ_SHARED_TYPE_HW)
+ret = get_psd_info(processor_pminfo[cpu]->init, cpu,
+   &shared_type, &domain_info);
+if ( ret )
+return ret;
+
+if ( shared_type == CPUFREQ_SHARED_TYPE_HW )
 hw_all = 1;
 
-dom = perf->domain_info.domain;
+dom = domain_info.domain;
 
 list_for_each(pos, &cpufreq_dom_list_head) {
 cpufreq_dom = list_entry(pos, struct cpufreq_dom, node);
@@ -244,20 +270,27 @@ int cpufreq_add_cpu(unsigned int cpu)
 cpufreq_dom->dom = dom;
 list_add(&cpufreq_dom->node, &cpufreq_dom_list_head);
 } else {
+uint32_t firstcpu_shared_type;
+struct xen_psd_package firstcpu_domain_info;
+
 /* domain sanity check under whatever coordination type */
 firstcpu = cpumask_first(cpufreq_dom->map);
-if ((perf->domain_info.coord_type !=
-processor_pminfo[firstcpu]->perf.domain_info.coord_type) ||
-(perf->domain_info.num_processors !=
-processor_pminfo[firstcpu]->perf.domain_info.num_processors)) {
-
+ret = get_psd_info(processor_pminfo[firstcpu]->init, firstcpu,
+   &firstcpu_shared_type, &firstcpu_domain_info);
+if ( ret )
+return ret;
+
+if ( (domain_info.coord_type != firstcpu_domain_info.coord_type) ||
+ (domain_info.num_processors !=
+  firstcpu_domain_info.num_processors) )
+{
 printk(KERN_WARNING "cpufreq fail to add CPU%d:"
"incorrect _PSD(%"PRIu64":%"PRIu64"), "
"expect(%"PRIu64"/%"PRIu64")\n",
-   cpu, perf->domain_info.coord_type,
-   perf->domain_info.num_processors,
-   processor_pminfo[firstcpu]->perf.domain_info.coord_type,
-   processor_pminfo[firstcpu]->perf.domain_info.num_processors
+   cpu, domain_info.coord_type,
+   domain_info.num_processors,
+   firstcpu_domain_info.coord_type,
+   firstcpu_domain_info.num_processors
 );
 return -EINVAL;
 }
@@ -304,8 +337,9 @@ int cpufreq_add_cpu(unsigned int cpu)
 if (ret)
 goto err1;
 
-if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
-   perf->domain_info.num_processors)) {
+if ( hw_all || (cpumask_weight(cpufreq_dom->map) ==
+domain_info.num_processors) )
+{
 memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
 policy-

[PATCH v4 03/15] xen/x86: introduce new sub-hypercall to propagate CPPC data

2025-04-14 Thread Penny Zheng

In order to provide backward compatibility with existing governors
that represent performance as frequency, like ondemand, the _CPC
table can optionally provide processor frequency range values, Lowest
frequency and Norminal frequency, to let OS use Lowest Frequency/
Performance and Nominal Frequency/Performance as anchor points to
create linear mapping of CPPC abstract performance to CPU frequency.

As Xen is uncapable of parsing the ACPI dynamic table, we'd like to
introduce a new sub-hypercall "XEN_PM_CPPC" to propagate required CPPC
data from dom0 kernel to Xen.
In the according handler set_cppc_pminfo(), we do _CPC and _PSD
sanitization check, as both _PSD and _CPC info are necessary for correctly
initializing cpufreq cores in CPPC mode.
Users shall be warned that if we failed at this point,
no fallback scheme, like legacy P-state could be switched to.
A new flag "XEN_CPPC_INIT" is also introduced to differentiate cpufreq core
initialised in Px mode.

Signed-off-by: Penny Zheng 
---
v1 -> v2:
- Remove unnecessary figure braces
- Pointer-to-const for print_CPPC and set_cppc_pminfo
- Structure allocation shall use xvzalloc()
- Unnecessary memcpy(), and change it to a (type safe) structure assignment
- Add comment for struct xen_processor_cppc, and keep the chosen fields
in the order _CPC has them
- Obey to alphabetic sorting, and prefix compat structures with ? instead
of !
---
v2 -> v3:
- Trim too long line
- Re-place set_cppc_pminfo() past set_px_pminfo()
- Fix Misra violations: Declaration and definition ought to agree
in parameter names
- Introduce a new flag XEN_PM_CPPC to reflect processor initialised in CPPC
mode
---
v3 -> v4:
- Refactor commit message
- make "acpi_id" unsigned int
- Add warning message when cpufreq_cpu_init() failed only under debug mode
- Expand "struct xen_processor_cppc" to include _PSD and shared type
- add sanity check for ACPI CPPC data
---
 xen/arch/x86/platform_hypercall.c |   5 +
 xen/drivers/cpufreq/cpufreq.c | 131 --
 xen/include/acpi/cpufreq/processor_perf.h |   4 +-
 xen/include/public/platform.h |  26 +
 xen/include/xen/pmstat.h  |   2 +
 xen/include/xlat.lst  |   1 +
 6 files changed, 161 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c 
b/xen/arch/x86/platform_hypercall.c
index 90abd3197f..49717e9ca9 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -572,6 +572,11 @@ ret_t do_platform_op(
 break;
 }
 
+case XEN_PM_CPPC:
+ret = set_cppc_pminfo(op->u.set_pminfo.id,
+  &op->u.set_pminfo.u.cppc_data);
+break;
+
 default:
 ret = -EINVAL;
 break;
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index b020ccbcf7..e01acc0c2d 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -205,6 +206,12 @@ static int get_psd_info(uint32_t init, unsigned int cpu,
 if ( domain_info )
 *domain_info = processor_pminfo[cpu]->perf.domain_info;
 break;
+case XEN_CPPC_INIT:
+if ( shared_type )
+*shared_type = processor_pminfo[cpu]->cppc_data.shared_type;
+if ( domain_info )
+*domain_info = processor_pminfo[cpu]->cppc_data.domain_info;
+break;
 default:
 ret = -EINVAL;
 break;
@@ -230,7 +237,7 @@ int cpufreq_add_cpu(unsigned int cpu)
 if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
 return -EINVAL;
 
-if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
 return -EINVAL;
 
 if (!cpufreq_driver.init)
@@ -401,7 +408,7 @@ int cpufreq_del_cpu(unsigned int cpu)
 if ( !processor_pminfo[cpu] || !cpu_online(cpu) )
 return -EINVAL;
 
-if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
+if ( !(processor_pminfo[cpu]->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
 return -EINVAL;
 
 if (!per_cpu(cpufreq_cpu_policy, cpu))
@@ -497,12 +504,19 @@ static void print_PPC(unsigned int platform_limit)
 printk("\t_PPC: %d\n", platform_limit);
 }
 
-static int check_psd_pminfo(const struct xen_processor_performance *perf)
+static int check_psd_pminfo(const struct xen_processor_performance *perf,
+const struct xen_processor_cppc *cppc_data)
 {
+uint32_t shared_type;
+
+if ( !perf && !cppc_data )
+return -EINVAL;
+
+shared_type = perf ? perf->shared_type : cppc_data->shared_type;
 /* check domain coordination */
-if ( perf->shared_type != CPUFREQ_SHARED_TYPE_ALL &&
- perf->shared_type != CPUFREQ_SHARED_TYPE_ANY &&
- perf->shared_type != CPUFREQ_SHARED_TYPE_HW )
+if ( shared_type != CPUFREQ_SHARED_TYPE_

[PATCH v4 08/15] xen/amd: introduce amd_process_freq() to get processor frequency

2025-04-14 Thread Penny Zheng

When _CPC table could not provide processor frequency range
values for Xen governor, we need to read processor max frequency
as anchor point.
So we extract amd cpu core frequency calculation logic from amd_log_freq(),
and wrap it as a new helper amd_process_freq().

Signed-off-by: Penny Zheng 
---
v1 -> v2:
- new commit
---
v3 -> v4
- introduce amd_process_freq()
---
 xen/arch/x86/cpu/amd.c | 60 +++---
 xen/arch/x86/include/asm/amd.h |  4 +++
 2 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index f93dda927e..e875014de9 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -57,7 +57,6 @@ bool __initdata amd_virt_spec_ctrl;
 static bool __read_mostly fam17_c6_disabled;
 
 static uint64_t attr_const amd_parse_freq(unsigned char c, uint64_t value);
-#define INVAL_FREQ_MHZ  ~(uint64_t)0
 
 static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
 unsigned int *hi)
@@ -596,14 +595,13 @@ static uint64_t amd_parse_freq(unsigned char c, uint64_t 
value)
return freq;
 }
 
-void amd_log_freq(const struct cpuinfo_x86 *c)
+void amd_process_freq(const struct cpuinfo_x86 *c,
+ uint64_t *low_mhz, uint64_t *nom_mhz, uint64_t *hi_mhz)
 {
unsigned int idx = 0, h;
uint64_t hi, lo, val;
 
-   if (c->x86 < 0x10 || c->x86 > 0x1A ||
-   (c != &boot_cpu_data &&
-(!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1)
+   if (c->x86 < 0x10 || c->x86 > 0x1A)
return;
 
if (c->x86 < 0x17) {
@@ -684,20 +682,21 @@ void amd_log_freq(const struct cpuinfo_x86 *c)
 
if (idx && idx < h &&
!rdmsr_safe(0xC0010064 + idx, val) && (val >> 63) &&
-   !rdmsr_safe(0xC0010064, hi) && (hi >> 63))
-   printk("CPU%u: %lu (%lu ... %lu) MHz\n",
-  smp_processor_id(),
-  amd_parse_freq(c->x86, val),
-  amd_parse_freq(c->x86, lo),
-  amd_parse_freq(c->x86, hi));
-   else if (h && !rdmsr_safe(0xC0010064, hi) && (hi >> 63))
-   printk("CPU%u: %lu ... %lu MHz\n",
-  smp_processor_id(),
-  amd_parse_freq(c->x86, lo),
-  amd_parse_freq(c->x86, hi));
-   else
-   printk("CPU%u: %lu MHz\n", smp_processor_id(),
-  amd_parse_freq(c->x86, lo));
+   !rdmsr_safe(0xC0010064, hi) && (hi >> 63)) {
+   if (nom_mhz)
+   *nom_mhz = amd_parse_freq(c->x86, val);
+   if (low_mhz)
+   *low_mhz = amd_parse_freq(c->x86, lo);
+   if (hi_mhz)
+   *hi_mhz = amd_parse_freq(c->x86, hi);
+   } else if (h && !rdmsr_safe(0xC0010064, hi) && (hi >> 63)) {
+   if (low_mhz)
+   *low_mhz = amd_parse_freq(c->x86, lo);
+   if (hi_mhz)
+   *hi_mhz = amd_parse_freq(c->x86, hi);
+   } else
+   if (low_mhz)
+   *low_mhz = amd_parse_freq(c->x86, lo);
 }
 
 void cf_check early_init_amd(struct cpuinfo_x86 *c)
@@ -708,6 +707,29 @@ void cf_check early_init_amd(struct cpuinfo_x86 *c)
ctxt_switch_levelling(NULL);
 }
 
+void amd_log_freq(const struct cpuinfo_x86 *c)
+{
+   uint64_t low_mhz, nom_mhz, hi_mhz;
+
+   if (c != &boot_cpu_data &&
+   (!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1
+   return;
+
+   low_mhz = nom_mhz = hi_mhz = INVAL_FREQ_MHZ;
+   amd_process_freq(c, &low_mhz, &nom_mhz, &hi_mhz);
+
+   if (low_mhz != INVAL_FREQ_MHZ && nom_mhz != INVAL_FREQ_MHZ &&
+   hi_mhz != INVAL_FREQ_MHZ)
+   printk("CPU%u: %lu (%lu ... %lu) MHz\n",
+  smp_processor_id(),
+  low_mhz, nom_mhz, hi_mhz);
+   else if (low_mhz != INVAL_FREQ_MHZ && hi_mhz != INVAL_FREQ_MHZ)
+   printk("CPU%u: %lu ... %lu MHz\n",
+  smp_processor_id(), low_mhz, hi_mhz);
+   else if (low_mhz != INVAL_FREQ_MHZ)
+   printk("CPU%u: %lu MHz\n", smp_processor_id(), low_mhz);
+}
+
 void amd_init_lfence(struct cpuinfo_x86 *c)
 {
uint64_t value;
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index 9c9599a622..9dd3592bbb 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -174,4 +174,8 @@ bool amd_setup_legacy_ssbd(void);
 void amd_set_legacy_ssbd(bool enable);
 void amd_set_cpuid_user_dis(bool enable);
 
+#define INVAL_FREQ_MHZ  ~(uint64_t)0
+void amd_process_freq(const struct cpuinfo_x86 *c, uint64_t *low_mhz,
+ uint64_t *nom_mhz, uint64_t *hi_mhz);
+
 #endif /* __AMD_H__ */
-- 
2.34.1

[PATCH v4 00/15] amd-cppc CPU Performance Scaling Driver

2025-04-14 Thread Penny Zheng

amd-cppc is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on modern AMD APU and CPU series in
Xen. The new mechanism is based on Collaborative Processor Performance
Control (CPPC) which provides finer grain frequency management than
legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. CPPC replaces the ACPI P-states controls
and allows a flexible, low-latency interface for Xen to directly
communicate the performance hints to hardware.

amd_cppc driver has 2 operation modes: autonomous (active) mode,
and non-autonomous (passive) mode. We register different CPUFreq driver
for different modes, "amd-cppc" for passive mode and "amd-cppc-epp"
for active mode.

The passive mode leverages common governors such as *ondemand*,
*performance*, etc, to manage the performance hints. And the active mode
uses epp to provides a hint to the hardware if software wants to bias
toward performance (0x0) or energy efficiency (0xff). CPPC power algorithm
in hardware will automatically calculate the runtime workload and adjust the
realtime cpu cores frequency according to the power supply and thermal, core
voltage and some other hardware conditions.

amd-cppc is enabled on passive mode with a top-level `cpufreq=amd-cppc` option,
while users add extra `active` flag to select active mode.

With `cpufreq=amd-cppc,active`, we did a 60s sampling test to see the CPU
frequency change, through tweaking the energy_perf preference from
`xenpm set-cpufreq-cppc powersave` to `xenpm set-cpufreq-cppc performance`.
The outputs are as follows:
```
Setting CPU in powersave mode
Sampling and Outputs:
  Avg freq  200 KHz
  Avg freq  200 KHz
  Avg freq  200 KHz
Setting CPU in performance mode
Sampling and Outputs:
  Avg freq  464 KHz
  Avg freq  422 KHz
  Avg freq  464 KHz

Penny Zheng (15):
  xen/cpufreq: move "init" flag into common structure
  xen/cpufreq: extract _PSD info from "struct xen_processor_performance"
  xen/x86: introduce new sub-hypercall to propagate CPPC data
  xen/cpufreq: refactor cmdline "cpufreq=xxx"
  xen/x86: introduce "cpufreq=amd-cppc" xen cmdline
  xen/cpufreq: disable px statistic info in amd-cppc mode
  xen/cpufreq: fix core frequency calculation for AMD Family 1Ah CPUs
  xen/amd: introduce amd_process_freq() to get processor frequency
  xen/x86: introduce a new amd cppc driver for cpufreq scaling
  xen/cpufreq: only set gov NULL when cpufreq_driver.setpolicy is NULL
  xen/x86: implement EPP support for the amd-cppc driver in active mode
  tools/xenpm: Print CPPC parameters for amd-cppc driver
  tools/xenpm: fix unnecessary scaling_available_frequencies in CPPC
mode
  tools/xenpm: remove px_cap dependency check for average frequency
  xen/xenpm: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc
driver

 docs/misc/xen-command-line.pandoc |  13 +-
 tools/libs/ctrl/xc_pm.c   |  45 +-
 tools/misc/xenpm.c|  20 +-
 xen/arch/x86/acpi/cpufreq/Makefile|   1 +
 xen/arch/x86/acpi/cpufreq/acpi.c  |  14 +-
 xen/arch/x86/acpi/cpufreq/amd-cppc.c  | 708 ++
 xen/arch/x86/acpi/cpufreq/cpufreq.c   |  34 +-
 xen/arch/x86/cpu/amd.c|  81 ++-
 xen/arch/x86/include/asm/amd.h|   4 +
 xen/arch/x86/include/asm/msr-index.h  |   6 +
 xen/arch/x86/platform_hypercall.c |  16 +
 xen/drivers/acpi/pmstat.c |  42 +-
 xen/drivers/cpufreq/cpufreq.c | 306 --
 xen/drivers/cpufreq/utility.c |  14 +
 xen/include/acpi/cpufreq/cpufreq.h|  22 +-
 xen/include/acpi/cpufreq/processor_perf.h |  11 +-
 xen/include/public/platform.h |  36 +-
 xen/include/public/sysctl.h   |   2 +
 xen/include/xen/pmstat.h  |   2 +
 xen/include/xlat.lst  |   1 +
 20 files changed, 1268 insertions(+), 110 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c

-- 
2.34.1

[PATCH v4 07/15] xen/cpufreq: fix core frequency calculation for AMD Family 1Ah CPUs

2025-04-14 Thread Penny Zheng

AMD Family 1Ah CPU needs a different COF(Core Operating Frequency) formula,
due to a change in the PStateDef MSR layout in AMD Family 1Ah.
In AMD Family 1Ah, Core current operating frequency in MHz is calculated as
follows:
CoreCOF = Core::X86::Msr::PStateDef[CpuFid[11:0]] * 5MHz

We introduce a helper amd_parse_freq() to parse cpu min/nominal/max core
frequency from PstateDef register, to replace the original macro FREQ(v).
amd_parse_freq() is declared as const, as it mainly consists of mathematical
conputation.

Signed-off-by: Penny Zheng 
---
v2 -> v3:
- new commit
---
v3 -> v4:
 - introduce amd_parse_freq() and declare it as const
 - express if-else-arry() as switch()
---
 xen/arch/x86/cpu/amd.c | 43 +++---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index ce4e1df710..f93dda927e 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -56,6 +56,9 @@ bool __initdata amd_virt_spec_ctrl;
 
 static bool __read_mostly fam17_c6_disabled;
 
+static uint64_t attr_const amd_parse_freq(unsigned char c, uint64_t value);
+#define INVAL_FREQ_MHZ  ~(uint64_t)0
+
 static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
 unsigned int *hi)
 {
@@ -570,12 +573,35 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
   : c->cpu_core_id);
 }
 
+static uint64_t amd_parse_freq(unsigned char c, uint64_t value)
+{
+   uint64_t freq = INVAL_FREQ_MHZ;
+
+   switch (c) {
+   case 0x10 ... 0x16:
+   freq = (((value & 0x3f) + 0x10) * 100) >> ((value >> 6) & 7);
+   break;
+   case 0x17 ... 0x19:
+   freq = ((value & 0xff) * 25 * 8) / ((value >> 8) & 0x3f);
+   break;
+   case 0x1A:
+   freq = (value & 0xfff) * 5;
+   break;
+   default:
+   printk(XENLOG_ERR
+  "Unsupported cpu familly %c on cpufreq parsing", c);
+   break;
+   }
+
+   return freq;
+}
+
 void amd_log_freq(const struct cpuinfo_x86 *c)
 {
unsigned int idx = 0, h;
uint64_t hi, lo, val;
 
-   if (c->x86 < 0x10 || c->x86 > 0x19 ||
+   if (c->x86 < 0x10 || c->x86 > 0x1A ||
(c != &boot_cpu_data &&
 (!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1)
return;
@@ -656,19 +682,22 @@ void amd_log_freq(const struct cpuinfo_x86 *c)
if (!(lo >> 63))
return;
 
-#define FREQ(v) (c->x86 < 0x17 ? v) & 0x3f) + 0x10) * 100) >> (((v) >> 6) 
& 7) \
-: (((v) & 0xff) * 25 * 8) / (((v) >> 8) & 
0x3f))
if (idx && idx < h &&
!rdmsr_safe(0xC0010064 + idx, val) && (val >> 63) &&
!rdmsr_safe(0xC0010064, hi) && (hi >> 63))
printk("CPU%u: %lu (%lu ... %lu) MHz\n",
-  smp_processor_id(), FREQ(val), FREQ(lo), FREQ(hi));
+  smp_processor_id(),
+  amd_parse_freq(c->x86, val),
+  amd_parse_freq(c->x86, lo),
+  amd_parse_freq(c->x86, hi));
else if (h && !rdmsr_safe(0xC0010064, hi) && (hi >> 63))
printk("CPU%u: %lu ... %lu MHz\n",
-  smp_processor_id(), FREQ(lo), FREQ(hi));
+  smp_processor_id(),
+  amd_parse_freq(c->x86, lo),
+  amd_parse_freq(c->x86, hi));
else
-   printk("CPU%u: %lu MHz\n", smp_processor_id(), FREQ(lo));
-#undef FREQ
+   printk("CPU%u: %lu MHz\n", smp_processor_id(),
+  amd_parse_freq(c->x86, lo));
 }
 
 void cf_check early_init_amd(struct cpuinfo_x86 *c)
-- 
2.34.1

[PATCH v4 06/15] xen/cpufreq: disable px statistic info in amd-cppc mode

2025-04-14 Thread Penny Zheng

We need to bypass construction of px statistic info in
cpufreq_statistic_init() for amd-cppc mode, as P-states is not necessary there.

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- remove unnecessary stub for cpufreq_statistic_exit()
---
 xen/drivers/cpufreq/utility.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index e690a484f1..b35e2eb1b6 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -98,6 +98,9 @@ int cpufreq_statistic_init(unsigned int cpu)
 if ( !pmpt )
 return -EINVAL;
 
+if ( !(pmpt->init & XEN_PX_INIT) )
+return 0;
+
 spin_lock(cpufreq_statistic_lock);
 
 pxpt = per_cpu(cpufreq_statistic_data, cpu);
-- 
2.34.1

[PATCH v4 15/15] xen/xenpm: Adapt SET/GET_CPUFREQ_CPPC xen_sysctl_pm_op for amd-cppc driver

2025-04-14 Thread Penny Zheng

Introduce helper set_amd_cppc_para and get_amd_cppc_para to
SET/GET CPPC-related para for amd-cppc/amd-cppc-epp driver.

Signed-off-by: Penny Zheng 
---
v1 -> v2:
- Give the variable des_perf an initializer of 0
- Use the strncmp()s directly in the if()
---
v3 -> v4
- refactor comments
- remove double blank lines
- replace amd_cppc_in_use flag with XEN_PROCESSOR_PM_CPPC
---
 xen/arch/x86/acpi/cpufreq/amd-cppc.c | 121 +++
 xen/drivers/acpi/pmstat.c|  22 -
 xen/include/acpi/cpufreq/cpufreq.h   |   4 +
 3 files changed, 143 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c 
b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
index 3a576fd4be..95d04bf77a 100644
--- a/xen/arch/x86/acpi/cpufreq/amd-cppc.c
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -540,6 +540,127 @@ static int cf_check amd_cppc_epp_set_policy(struct 
cpufreq_policy *policy)
 return 0;
 }
 
+int get_amd_cppc_para(unsigned int cpu,
+  struct xen_cppc_para *cppc_para)
+{
+const struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+
+if ( data == NULL )
+return -ENODATA;
+
+cppc_para->features = 0;
+cppc_para->lowest   = data->caps.lowest_perf;
+cppc_para->lowest_nonlinear = data->caps.lowest_nonlinear_perf;
+cppc_para->nominal  = data->caps.nominal_perf;
+cppc_para->highest  = data->caps.highest_perf;
+cppc_para->minimum  = data->req.min_perf;
+cppc_para->maximum  = data->req.max_perf;
+cppc_para->desired  = data->req.des_perf;
+cppc_para->energy_perf  = data->req.epp;
+
+return 0;
+}
+
+int set_amd_cppc_para(const struct cpufreq_policy *policy,
+  const struct xen_set_cppc_para *set_cppc)
+{
+unsigned int cpu = policy->cpu;
+struct amd_cppc_drv_data *data = per_cpu(amd_cppc_drv_data, cpu);
+uint8_t max_perf, min_perf, des_perf = 0, epp;
+
+if ( data == NULL )
+return -ENOENT;
+
+/* Validate all parameters - Disallow reserved bits. */
+if ( set_cppc->minimum > UINT8_MAX || set_cppc->maximum > UINT8_MAX ||
+ set_cppc->desired > UINT8_MAX || set_cppc->energy_perf > UINT8_MAX )
+return -EINVAL;
+
+/* Only allow values if params bit is set. */
+if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+  set_cppc->desired) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+  set_cppc->minimum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+  set_cppc->maximum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
+  set_cppc->energy_perf) )
+return -EINVAL;
+
+/* Activity window not supported in MSR */
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
+return -EOPNOTSUPP;
+
+/* Return if there is nothing to do. */
+if ( set_cppc->set_params == 0 )
+return 0;
+
+epp = per_cpu(epp_init, cpu);
+/*
+ * Apply presets:
+ * XEN_SYSCTL_CPPC_SET_DESIRED reflects whether desired perf is set, which
+ * is also the flag to distiguish between passive mode and active mode.
+ * When it is set, CPPC in passive mode, only
+ * XEN_SYSCTL_CPPC_SET_PRESET_NONE could be chosen, where min_perf =
+ * lowest_nonlinear_perf to ensures performance in P-state range.
+ * when it is not set, CPPC in active mode, three different profile
+ * XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE/PERFORMANCE/BALANCE are provided,
+ * where min_perf = lowest_perf having T-state range of performance.
+ */
+switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
+{
+case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+return -EINVAL;
+min_perf = data->caps.lowest_perf;
+/* Lower max frequency to nominal */
+max_perf = data->caps.nominal_perf;
+epp = CPPC_ENERGY_PERF_MAX_POWERSAVE;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+return -EINVAL;
+/* Increase idle frequency to highest */
+min_perf = data->caps.highest_perf;
+max_perf = data->caps.highest_perf;
+epp = CPPC_ENERGY_PERF_MAX_PERFORMANCE;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_BALANCE:
+if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+return -EINVAL;
+min_perf = data->caps.lowest_perf;
+max_perf = data->caps.highest_perf;
+epp = CPPC_ENERGY_PERF_BALANCE;
+break;
+
+case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
+min_perf = data->caps.lowest_nonlinear_perf;
+max_perf = data->caps.highest_perf;
+break;
+
+default:
+return -EINVAL;
+}
+
+/* Further customize presets if needed */
+if ( set_cppc-

[PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc driver

2025-04-14 Thread Penny Zheng

HWP, amd-cppc, amd-cppc-epp are all the implementation
of ACPI CPPC (Collaborative Processor Performace Control),
so we introduce cppc_mode flag to print CPPC-related para.

And HWP and amd-cppc-epp are both governor-less driver,
so we introduce hw_auto flag to bypass governor-related print.

Validation check on `xenpm get-cpufreq-para` shall also consider
CPPC scenario.

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- Include validation check fix here
---
 tools/misc/xenpm.c| 18 ++
 xen/drivers/acpi/pmstat.c |  7 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index db658ebadd..29fffebebd 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -790,9 +790,18 @@ static unsigned int calculate_activity_window(const 
xc_cppc_para_t *cppc,
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
*p_cpufreq)
 {
-bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
+bool cppc_mode = false, hw_auto = false;
 int i;
 
+if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
+ !strcmp(p_cpufreq->scaling_driver, XEN_AMD_CPPC_DRIVER_NAME) ||
+ !strcmp(p_cpufreq->scaling_driver, XEN_AMD_CPPC_EPP_DRIVER_NAME) )
+cppc_mode = true;
+
+if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
+ !strcmp(p_cpufreq->scaling_driver, XEN_AMD_CPPC_EPP_DRIVER_NAME) )
+hw_auto = true;
+
 printf("cpu id   : %d\n", cpuid);
 
 printf("affected_cpus:");
@@ -800,7 +809,7 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 printf(" %d", p_cpufreq->affected_cpus[i]);
 printf("\n");
 
-if ( hwp )
+if ( hw_auto )
 printf("cpuinfo frequency: base [%"PRIu32"] max [%"PRIu32"]\n",
p_cpufreq->cpuinfo_min_freq,
p_cpufreq->cpuinfo_max_freq);
@@ -812,7 +821,7 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 
 printf("scaling_driver   : %s\n", p_cpufreq->scaling_driver);
 
-if ( hwp )
+if ( cppc_mode )
 {
 const xc_cppc_para_t *cppc = &p_cpufreq->u.cppc_para;
 
@@ -838,7 +847,8 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
cppc->desired,
cppc->desired ? "" : " hw autonomous");
 }
-else
+
+if ( !hw_auto )
 {
 if ( p_cpufreq->gov_num )
 printf("scaling_avail_gov: %s\n",
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 767594908c..0e90ffcc19 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -201,7 +201,7 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 pmpt = processor_pminfo[op->cpuid];
 policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
 
-if ( !pmpt || !pmpt->perf.states ||
+if ( !pmpt || ((pmpt->init & XEN_PX_INIT) && !pmpt->perf.states) ||
  !policy || !policy->governor )
 return -EINVAL;
 
@@ -461,9 +461,10 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
 switch ( op->cmd & PM_PARA_CATEGORY_MASK )
 {
 case CPUFREQ_PARA:
-if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
+if ( !(xen_processor_pmbits & (XEN_PROCESSOR_PM_PX |
+   XEN_PROCESSOR_PM_CPPC)) )
 return -ENODEV;
-if ( !pmpt || !(pmpt->init & XEN_PX_INIT) )
+if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
 return -EINVAL;
 break;
 }
-- 
2.34.1

[PATCH v4 13/15] tools/xenpm: fix unnecessary scaling_available_frequencies in CPPC mode

2025-04-14 Thread Penny Zheng

In `xenpm get-cpufreq-para `, para scaling_available_frequencies
only has meaningful value when cpufreq driver in legacy P-states mode.

So we drop the "has_num" condition check, and mirror the ->gov_num check for
both ->freq_num and ->cpu_num in xc_get_cpufreq_para().
In get_cpufreq_para(), add "freq_num" check to avoid copying data to
op->u.get_para.scaling_available_frequencies in CPPC mode.

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- drop the "has_num" condition check
---
 tools/libs/ctrl/xc_pm.c   | 45 +--
 xen/drivers/acpi/pmstat.c | 11 ++
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index ff7b5ada05..2089aa41b3 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -212,34 +212,39 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 DECLARE_NAMED_HYPERCALL_BOUNCE(scaling_available_governors,
 user_para->scaling_available_governors,
 user_para->gov_num * CPUFREQ_NAME_LEN * sizeof(char), 
XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-bool has_num = user_para->cpu_num && user_para->freq_num;
 
-if ( has_num )
+if ( (user_para->cpu_num && !user_para->affected_cpus) ||
+ (user_para->freq_num && !user_para->scaling_available_frequencies) ||
+ (user_para->gov_num && !user_para->scaling_available_governors) )
+{
+errno = EINVAL;
+return -1;
+}
+if ( user_para->cpu_num )
 {
-if ( (!user_para->affected_cpus)||
- (!user_para->scaling_available_frequencies)||
- (user_para->gov_num && !user_para->scaling_available_governors) )
-{
-errno = EINVAL;
-return -1;
-}
 ret = xc_hypercall_bounce_pre(xch, affected_cpus);
 if ( ret )
 return ret;
+}
+if ( user_para->freq_num )
+{
 ret = xc_hypercall_bounce_pre(xch, scaling_available_frequencies);
 if ( ret )
 goto unlock_2;
-if ( user_para->gov_num )
-ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
-if ( ret )
-goto unlock_3;
+}
+if ( user_para->gov_num )
+ret = xc_hypercall_bounce_pre(xch, scaling_available_governors);
+if ( ret )
+goto unlock_3;
 
+if ( user_para->cpu_num )
 set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
-set_xen_guest_handle(sys_para->scaling_available_frequencies, 
scaling_available_frequencies);
-if ( user_para->gov_num )
-set_xen_guest_handle(sys_para->scaling_available_governors,
- scaling_available_governors);
-}
+if ( user_para->freq_num )
+set_xen_guest_handle(sys_para->scaling_available_frequencies,
+ scaling_available_frequencies);
+if ( user_para->gov_num )
+set_xen_guest_handle(sys_para->scaling_available_governors,
+ scaling_available_governors);
 
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = GET_CPUFREQ_PARA;
@@ -258,9 +263,7 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 user_para->gov_num  = sys_para->gov_num;
 }
 
-if ( has_num )
-goto unlock_4;
-return ret;
+goto unlock_4;
 }
 else
 {
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 0e90ffcc19..83cfef398e 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -228,10 +228,13 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 ret = copy_to_guest(op->u.get_para.affected_cpus,
 data, op->u.get_para.cpu_num);
 
-for ( i = 0; i < op->u.get_para.freq_num; i++ )
-data[i] = pmpt->perf.states[i].core_frequency * 1000;
-ret += copy_to_guest(op->u.get_para.scaling_available_frequencies,
- data, op->u.get_para.freq_num);
+if ( op->u.get_para.freq_num )
+{
+for ( i = 0; i < op->u.get_para.freq_num; i++ )
+data[i] = pmpt->perf.states[i].core_frequency * 1000;
+ret += copy_to_guest(op->u.get_para.scaling_available_frequencies,
+ data, op->u.get_para.freq_num);
+}
 
 xfree(data);
 if ( ret )
-- 
2.34.1

[PATCH v4 14/15] tools/xenpm: remove px_cap dependency check for average frequency

2025-04-14 Thread Penny Zheng

In `xenpm start` command, the monitor of average frequency shall
not depend on the existence of legacy P-states, so removing "px_cap"
dependancy check.

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- new commit
---
 tools/misc/xenpm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 29fffebebd..b823e5c433 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -539,7 +539,7 @@ static void signal_int_handler(int signo)
 res / 100UL, 100UL * res / (double)sum_px[i]);
 }
 }
-if ( px_cap && avgfreq[i] )
+if ( avgfreq[i] )
 printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
 }
 
-- 
2.34.1

Re: [PATCH 2/5] xen/io: provide helpers for multi size MMIO accesses

2025-04-14 Thread Jan Beulich

On 14.04.2025 09:49, Julien Grall wrote:
> On 14/04/2025 15:07, Jan Beulich wrote:
>> On 11.04.2025 12:54, Roger Pau Monne wrote:
>>> Several handlers have the same necessity of reading from an MMIO region
>>> using 1, 2, 4 or 8 bytes accesses.  So far this has been open-coded in the
>>> function itself.  Instead provide a new handler that encapsulates the
>>> accesses.
>>>
>>> Since the added helpers are not architecture specific, introduce a new
>>> generic io.h header.
>>
>> Except that ...
>>
>>> --- /dev/null
>>> +++ b/xen/include/xen/io.h
>>> @@ -0,0 +1,63 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Generic helpers for doing MMIO accesses.
>>> + *
>>> + * Copyright (c) 2025 Cloud Software Group
>>> + */
>>> +#ifndef XEN_IO_H
>>> +#define XEN_IO_H
>>> +
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +static inline uint64_t read_mmio(const volatile void __iomem *mem,
>>> + unsigned int size)
>>> +{
>>> +    switch ( size )
>>> +    {
>>> +    case 1:
>>> +    return readb(mem);
>>> +
>>> +    case 2:
>>> +    return readw(mem);
>>> +
>>> +    case 4:
>>> +    return readl(mem);
>>> +
>>> +    case 8:
>>> +    return readq(mem);
>>
>> ... this and ...
>>
>>> +    }
>>> +
>>> +    ASSERT_UNREACHABLE();
>>> +    return ~0UL;
>>> +}
>>> +
>>> +static inline void write_mmio(volatile void __iomem *mem, uint64_t data,
>>> +  unsigned int size)
>>> +{
>>> +    switch ( size )
>>> +    {
>>> +    case 1:
>>> +    writeb(data, mem);
>>> +    break;
>>> +
>>> +    case 2:
>>> +    writew(data, mem);
>>> +    break;
>>> +
>>> +    case 4:
>>> +    writel(data, mem);
>>> +    break;
>>> +
>>> +    case 8:
>>> +    writeq(data, mem);
>>> +    break;
>>
>> ... this may (generally will) not work on 32-bit architectures. Add
>> CONFIG_64BIT conditionals? At which point return type / last parameter
>> type could move from uint64_t to unsigned long.
> 
> Technically arm32 bit supports 64-bit write because we mandate LPAE. I see 
> this is used by the vPCI code. Are we expecting to have any 64-bit access?

vPCI is, I think, supposed to not see 64-bit accesses (to config space).
However, vMSI-X already may see such.

Jan

Re: [PATCH] CI: fix waiting for final test message (again)

2025-04-14 Thread Andrew Cooper

On 13/04/2025 2:47 pm, Marek Marczykowski-Górecki wrote:
> The previous attempt has correct diagnosis, but added -notransfer flag
> in a wrong place - it should be used in the first (outer) match out of
> two, not the second (inner) one.
>
> Fixes: 1e12cbd6af2c ("CI: fix waiting for final test message")
> Signed-off-by: Marek Marczykowski-Górecki 

Acked-by: Andrew Cooper 

> This actually fixes the issue described in the referenced commit. When
> that issue happens, it can be seen as a complete console output (up to
> Alpine login prompt), but test still failed.
> But that is not all the issues, sometimes it hangs really in the middle
> of dom0 boot, for example with last lines as:
>
> [1.816052] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> Poking KASLR using RDRAND RDTSC...
> [1.818089] Dynamic Preempt: voluntary
> [1.818251] rcu: Preemptible hierarchical RCU implementation.
> [1.818254] rcu:   RCU event tracing is ena
>
> and sits there for over 120s.
>
> It's unclear to me yet whether it's a real dom0 hang, or an issue with
> grabbing console output. Debugging...

This is now the only failure I've been encountering, given the extensive
runs over the weekend.

~Andrew

Re: [PATCH v3 11/16] x86/hyperlaunch: locate dom0 initrd with hyperlaunch

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:07 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Look for a subnode of type `multiboot,ramdisk` within a domain node and
>> parse via the fdt_read_multiboot_module() helper. After a successful
>> helper call, the module index is returned and the module is guaranteed
>> to be in the module list.
>> 
>> Fix unused typo in adjacent comment.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Signed-off-by: Jason Andryuk jason.andr...@amd.com
>> 
>> Signed-off-by: Alejandro Vallejo agarc...@amd.com
>> 
>> ---
>> v3:
>> * Reworded commit message to state the helper postconditions.
>> * Wrapped long line
>> * Fix ramdisk -> module rename
>> 
>> * Move ramdisk parsing from later patch
>> * Remove initrdidx indent
>> ---
>> xen/arch/x86/domain-builder/fdt.c | 29 +
>> xen/arch/x86/setup.c | 4 ++--
>> 2 files changed, 31 insertions(+), 2 deletions(-)
>> 
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index bc9903a9de..0f5fd01557 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -195,6 +195,35 @@ static int __init process_domain_node(
>> !((char *)__va(bd->kernel->cmdline_pa))[0] )
>> 
>> bd->kernel->fdt_cmdline = fdt_get_prop_offset(
>> 
>> fdt, node, "bootargs", &bd->kernel->cmdline_pa);
>> 
>> +
>> + continue;
>> + }
>> + else if ( fdt_node_check_compatible(fdt, node,
>> + "multiboot,ramdisk") == 0 )
>> + {
>> + int idx;
>> +
>> + if ( bd->module )
>> 
>> + {
>> + printk(XENLOG_ERR "Duplicate ramdisk module for domain %s)\n",
>
> I would start the message with lower case so it is consistent with the other 
> one.

As mentioned before, this is due to how it's meant to be rendered. This
is a standalone message, hence the uppercase (consistent with the
duplicate kernel).

Will change the XENLOG_ERR into XENLOG_WARNING though.

>
>> + name);
>> + continue;
>> + }
>> +
>> + idx = fdt_read_multiboot_module(fdt, node, address_cells,
>> + size_cells,bi);
>> + if ( idx < 0 )
>> + {
>> + printk(" failed processing ramdisk module for domain %s\n",
>> + name);
>
> Prepend the log message with XENLOG_ERR ?

Indeed.

>
>> + return -EINVAL;
>> + }
>> +
>> + printk(" ramdisk: boot module %d\n", idx);
>> + bi->mods[idx].type = BOOTMOD_RAMDISK;
>> 
>> + bd->module = &bi->mods[idx];
>> 
>> +
>> + continue;
>> }
>> }
>> 
>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index ca4415d020..3dfa81b48c 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -2149,11 +2149,11 @@ void asmlinkage __init noreturn __start_xen(void)
>> * At this point all capabilities that consume boot modules should have
>> * claimed their boot modules. Find the first unclaimed boot module and
>> * claim it as the initrd ramdisk. Do a second search to see if there are
>> - * any remaining unclaimed boot modules, and report them as unusued initrd
>> + * any remaining unclaimed boot modules, and report them as unused initrd
>> * candidates.
>> */
>> initrdidx = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
>> - if ( initrdidx < MAX_NR_BOOTMODS )
>> + if ( !bi->hyperlaunch_enabled && initrdidx < MAX_NR_BOOTMODS )
>> 
>> {
>> bi->mods[initrdidx].type = BOOTMOD_RAMDISK;
>> 
>> bi->domains[0].module = &bi->mods[initrdidx];
>> 
>> --
>> 2.43.0

Cheers,
Alejandro

Re: [PATCH v1 02/14] xen/riscv: introduce smp_clear_cpu_maps()

2025-04-14 Thread Jan Beulich

On 14.04.2025 17:05, Oleksii Kurochko wrote:
> On 4/10/25 3:10 PM, Jan Beulich wrote:
>> On 08.04.2025 17:57, Oleksii Kurochko wrote:
>>> +void __init smp_clear_cpu_maps(void)
>>> +{
>>> +cpumask_clear(&cpu_possible_map);
>>> +cpumask_clear(&cpu_online_map);
>> What's the point of these? All three maps start out fully zeroed.
> 
> It could be really dropped. I saw your patch for Arm, I'll align the current
> patch with that changes.
> 
>>> +cpumask_set_cpu(0, &cpu_possible_map);
>>> +cpumask_set_cpu(0, &cpu_online_map);
>> These are contradicting the name of the function. The somewhat equivalent
>> function we have on x86 is smp_prepare_boot_cpu().
>>
>>> +cpumask_copy(&cpu_present_map, &cpu_possible_map);
>> Another cpumask_set_cpu() is probably cheaper here then.
> 
> What do you mean by cheaper here?

Less code to execute to achieve the same effect.

Jan

[PATCH v7 0/3] Enable early bootup of Armv8-R AArch32 systems

2025-04-14 Thread Ayan Kumar Halder

Enable early booting of Armv8-R AArch32 based systems.

Added Luca's R-b in all the patches.
Added Michal's R-b in patch 1 and 3.

Ayan Kumar Halder (3):
  xen/arm: Move some of the functions to common file
  xen/arm32: Create the same boot-time MPU regions as arm64
  xen/arm32: mpu: Stubs to build MPU for arm32

 xen/arch/arm/arm32/Makefile  |   1 +
 xen/arch/arm/arm32/mpu/Makefile  |   3 +
 xen/arch/arm/arm32/mpu/head.S| 104 +++
 xen/arch/arm/arm32/mpu/p2m.c |  19 +
 xen/arch/arm/arm32/mpu/smpboot.c |  26 ++
 xen/arch/arm/arm64/mpu/head.S|  78 +
 xen/arch/arm/include/asm/arm32/sysregs.h |  13 ++-
 xen/arch/arm/include/asm/arm64/sysregs.h |  13 +++
 xen/arch/arm/include/asm/cpregs.h|   2 +
 xen/arch/arm/include/asm/mm.h|   9 +-
 xen/arch/arm/include/asm/mmu/mm.h|   7 ++
 xen/arch/arm/include/asm/mpu/cpregs.h|  32 +++
 xen/arch/arm/include/asm/mpu/mm.h|   5 ++
 xen/arch/arm/include/asm/mpu/regions.inc |  79 +
 xen/arch/arm/mpu/Makefile|   1 +
 xen/arch/arm/mpu/domain_page.c   |  45 ++
 16 files changed, 350 insertions(+), 87 deletions(-)
 create mode 100644 xen/arch/arm/arm32/mpu/Makefile
 create mode 100644 xen/arch/arm/arm32/mpu/head.S
 create mode 100644 xen/arch/arm/arm32/mpu/p2m.c
 create mode 100644 xen/arch/arm/arm32/mpu/smpboot.c
 create mode 100644 xen/arch/arm/include/asm/mpu/cpregs.h
 create mode 100644 xen/arch/arm/include/asm/mpu/regions.inc
 create mode 100644 xen/arch/arm/mpu/domain_page.c

-- 
2.25.1

[PATCH v7 2/3] xen/arm32: Create the same boot-time MPU regions as arm64

2025-04-14 Thread Ayan Kumar Halder

Create Boot-time MPU protection regions (similar to Armv8-R AArch64) for
Armv8-R AArch32.
Also, defined *_PRBAR macros for arm32. The only difference from arm64 is that
XN is 1-bit for arm32.
Define the system registers and macros in mpu/cpregs.h.

Introduce WRITE_SYSREG_ASM() to write to system registers in assembly.

Signed-off-by: Ayan Kumar Halder 
Reviewed-by: Luca Fancellu 
Tested-by: Luca Fancellu 
---
Changes from

v1 -

1. enable_mpu() now sets HMAIR{0,1} registers. This is similar to what is
being done in enable_mmu(). All the mm related configurations happen in this
function.

2. Fixed some typos. 

v2 -
1. Include the common prepare_xen_region.inc in head.S.

2. Define LOAD_SYSREG()/STORE_SYSREG() for arm32.

v3 -
1. Rename STORE_SYSREG() as WRITE_SYSREG_ASM()

2. enable_boot_cpu_mm() is defined in head.S

v4 -
1. *_PRBAR is moved to arm32/sysregs.h.

2. MPU specific CP15 system registers are defined in mpu/cpregs.h. 

v5 -
1. WRITE_SYSREG_ASM is enclosed within #ifdef __ASSEMBLY__

2. enable_mpu() clobbers r0 only.

3. Definitions in mpu/cpregs.h in enclosed within ARM_32.

4. Removed some #ifdefs and style changes.

v6 -
1. Coding style issues.

2. Kept Luca's R-b and T-b as the changes should not impact the behavior.

3. Added alias and renamed the sysregs as it is named in the specs.

 xen/arch/arm/arm32/Makefile  |   1 +
 xen/arch/arm/arm32/mpu/Makefile  |   1 +
 xen/arch/arm/arm32/mpu/head.S| 104 +++
 xen/arch/arm/include/asm/arm32/sysregs.h |  13 ++-
 xen/arch/arm/include/asm/cpregs.h|   2 +
 xen/arch/arm/include/asm/mpu/cpregs.h|  32 +++
 6 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/arm32/mpu/Makefile
 create mode 100644 xen/arch/arm/arm32/mpu/head.S
 create mode 100644 xen/arch/arm/include/asm/mpu/cpregs.h

diff --git a/xen/arch/arm/arm32/Makefile b/xen/arch/arm/arm32/Makefile
index 40a2b4803f..537969d753 100644
--- a/xen/arch/arm/arm32/Makefile
+++ b/xen/arch/arm/arm32/Makefile
@@ -1,5 +1,6 @@
 obj-y += lib/
 obj-$(CONFIG_MMU) += mmu/
+obj-$(CONFIG_MPU) += mpu/
 
 obj-$(CONFIG_EARLY_PRINTK) += debug.o
 obj-y += domctl.o
diff --git a/xen/arch/arm/arm32/mpu/Makefile b/xen/arch/arm/arm32/mpu/Makefile
new file mode 100644
index 00..3340058c08
--- /dev/null
+++ b/xen/arch/arm/arm32/mpu/Makefile
@@ -0,0 +1 @@
+obj-y += head.o
diff --git a/xen/arch/arm/arm32/mpu/head.S b/xen/arch/arm/arm32/mpu/head.S
new file mode 100644
index 00..b2c5245e51
--- /dev/null
+++ b/xen/arch/arm/arm32/mpu/head.S
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Start-of-day code for an Armv8-R-AArch32 MPU system.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Set up the memory attribute type tables and enable EL2 MPU and data cache.
+ * If the Background region is enabled, then the MPU uses the default memory
+ * map as the Background region for generating the memory
+ * attributes when MPU is disabled.
+ * Since the default memory map of the Armv8-R AArch32 architecture is
+ * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.
+ *
+ * Clobbers r0
+ */
+FUNC_LOCAL(enable_mpu)
+/* Set up memory attribute type tables */
+mov_w r0, MAIR0VAL
+mcr   CP32(r0, HMAIR0)
+mov_w r0, MAIR1VAL
+mcr   CP32(r0, HMAIR1)
+
+mrc   CP32(r0, HSCTLR)
+bic   r0, r0, #SCTLR_ELx_BR   /* Disable Background region */
+orr   r0, r0, #SCTLR_Axx_ELx_M/* Enable MPU */
+orr   r0, r0, #SCTLR_Axx_ELx_C/* Enable D-cache */
+mcr   CP32(r0, HSCTLR)
+isb
+
+ret
+END(enable_mpu)
+
+/*
+ * Maps the various sections of Xen (described in xen.lds.S) as different MPU
+ * regions.
+ *
+ * Clobbers r0 - r5
+ *
+ */
+FUNC(enable_boot_cpu_mm)
+/* Get the number of regions specified in MPUIR_EL2 */
+mrc   CP32(r5, MPUIR_EL2)
+and   r5, r5, #NUM_MPU_REGIONS_MASK
+
+/* x0: region sel */
+mov   r0, #0
+/* Xen text section. */
+mov_w   r1, _stext
+mov_w   r2, _etext
+prepare_xen_region r0, r1, r2, r3, r4, r5, attr_prbar=REGION_TEXT_PRBAR
+
+/* Xen read-only data section. */
+mov_w   r1, _srodata
+mov_w   r2, _erodata
+prepare_xen_region r0, r1, r2, r3, r4, r5, attr_prbar=REGION_RO_PRBAR
+
+/* Xen read-only after init and data section. (RW data) */
+mov_w   r1, __ro_after_init_start
+mov_w   r2, __init_begin
+prepare_xen_region r0, r1, r2, r3, r4, r5
+
+/* Xen code section. */
+mov_w   r1, __init_begin
+mov_w   r2, __init_data_begin
+prepare_xen_region r0, r1, r2, r3, r4, r5, attr_prbar=REGION_TEXT_PRBAR
+
+/* Xen data and BSS section. */
+mov_w   r1, __init_data_begin
+mov_w   r2, __bss_end
+prepare_xen_region r0, r1, r2, r3, r4, r5
+
+#ifdef CONFIG_EARLY_PRINTK
+/* Xen early UART section. */
+mov_w   r1, CONFIG_EARLY_UART_BASE_ADDRESS
+mov_w   r2, (CONFIG_EARLY_UART_BASE_ADDRESS + CONFIG_EARLY_U

Re: [PATCH v3 13/16] x86/hyperlaunch: specify dom0 mode with device tree

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:24 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Enable selecting the mode in which the domain will be built and ran. This
>> includes:
>> 
>> - whether it will be either a 32/64 bit domain
>> - if it will be run as a PV or HVM domain
>> - and if it will require a device model (not applicable for dom0)
>> 
>> In the device tree, this will be represented as a bit map that will be 
>> carried
>> through into struct boot_domain.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Reviewed-by: Jason Andryuk jason.andr...@amd.com
>> 
>> ---
>> xen/arch/x86/domain-builder/fdt.c | 19 +++
>> xen/arch/x86/include/asm/boot-domain.h | 5 +
>> xen/arch/x86/setup.c | 3 ++-
>> 3 files changed, 26 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index 4c6aafe195..da65f6a5a0 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -193,6 +193,25 @@ static int __init process_domain_node(
>> bd->domid = (domid_t)val;
>> 
>> printk(" domid: %d\n", bd->domid);
>> 
>> }
>> + else if ( strncmp(prop_name, "mode", name_len) == 0 )
>> + {
>> + if ( fdt_prop_as_u32(prop, &bd->mode) != 0 )
>> 
>> + {
>> + printk(" failed processing mode for domain %s\n", name);
>> + return -EINVAL;
>> + }
>> +
>> + printk(" mode: ");
>> + if ( !(bd->mode & BUILD_MODE_PARAVIRT) )
>> 
>> + {
>> + if ( bd->mode & BUILD_MODE_ENABLE_DM )
>> 
>> + printk("HVM\n");
>> + else
>> + printk("PVH\n");
>> + }
>> + else
>> + printk("PV\n");
>> + }
>> }
>
> I would re-write so the positive condition is processed first, e.g.:
>
> if ( bd->mode & BUILD_MODE_PARAVIRT )
> printk("PV\n");
> else if ( bd->mode & BUILD_MODE_ENABLE_DM )
> printk("HVM\n");
> else
> printk("PVH\n");
>
> I think it will reduce indentation and make code block a bit easier to read.
>

For sure. You're absolutely right.

> Also, if there are more uses for printing string representation of a
> boot module mode in the future, perhaps move it to a separate function?
>
> What do you think?

If there's more existing cases I'm happy to unify them, but otherwise
I'd rather keep the code inlined to avoid too much indirection.

Cheers,
Alejandro

Re: [PATCH v3 15/16] x86/hyperlaunch: add max vcpu parsing of hyperlaunch device tree

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 1:08 PM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> From: "Daniel P. Smith" 
>> 
>> Introduce the `cpus` property, named as such for dom0less compatibility, that
>> represents the maximum number of vpcus to allocate for a domain. In the 
>> device
>
> Nit: vcpus

Ack, and same below

>
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -246,6 +246,17 @@ static int __init process_domain_node(
>>  bd->max_pages = PFN_DOWN(kb * SZ_1K);
>>  printk("  max memory: %ld kb\n", kb);
>>  }
>> +else if ( strncmp(prop_name, "cpus", name_len) == 0 )
>> +{
>> +uint32_t val = UINT_MAX;
>> +if ( fdt_prop_as_u32(prop, &val) != 0 )
>
> And again the same nit.
>
>> +{
>> +printk("  failed processing max_vcpus for domain %s\n", 
>> name);
>
> There's no "max_vcpus" being processed here; that purely ...
>
>> +return -EINVAL;
>> +}
>> +bd->max_vcpus = val;
>
> ... the internal name we use for the struct field etc. The user observing the
> message ought to be able to easily associate it back with the DT item.
>
> Jan

Very true. I agree, and will change accordingly.

Cheers,
Alejandro

[PATCH AUTOSEL 6.14 07/34] xen: Change xen-acpi-processor dom0 dependency

2025-04-14 Thread Sasha Levin

From: Jason Andryuk 

[ Upstream commit 0f2946bb172632e122d4033e0b03f85230a29510 ]

xen-acpi-processor functions under a PVH dom0 with only a
xen_initial_domain() runtime check.  Change the Kconfig dependency from
PV dom0 to generic dom0 to reflect that.

Suggested-by: Jan Beulich 
Signed-off-by: Jason Andryuk 
Reviewed-by: Juergen Gross 
Tested-by: Jan Beulich 
Signed-off-by: Juergen Gross 
Message-ID: <20250331172913.51240-1-jason.andr...@amd.com>
Signed-off-by: Sasha Levin 
---
 drivers/xen/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f7d6f47971fdf..24f485827e039 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -278,7 +278,7 @@ config XEN_PRIVCMD_EVENTFD
 
 config XEN_ACPI_PROCESSOR
tristate "Xen ACPI processor"
-   depends on XEN && XEN_PV_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
+   depends on XEN && XEN_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
default m
help
  This ACPI processor uploads Power Management information to the Xen
-- 
2.39.5

Re: [PATCH v3] xen/riscv: Increase XEN_VIRT_SIZE

2025-04-14 Thread Oleksii Kurochko



On 4/10/25 10:48 AM, Jan Beulich wrote:

On 09.04.2025 21:01, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  #include 

@@ -35,6 +36,11 @@ static inline void *maddr_to_virt(paddr_t ma)
  return (void *)va;
  }
  
+#define is_init_section(p) ({   \

+char *p_ = (char *)(unsigned long)(p);  \
+(p_ >= __init_begin) && (p_ < __init_end);  \
+})

I think this wants to be put in xen/sections.h, next to where __init_{begin,end}
are declared. But first it wants making const-correct, to eliminate the 
potential
of it indirectly casting away const-ness from the incoming argument.

(At some point related stuff wants moving from kernel.h to sections.h, I 
suppose.
And at that point they will all want to have const added.)


Sure, I'll change to 'const char *p_ = (const char*)(unsigned long)(p)'.


--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -31,20 +31,24 @@ unsigned long __ro_after_init phys_offset; /* = load_start 
- XEN_VIRT_START */
  #define LOAD_TO_LINK(addr) ((unsigned long)(addr) - phys_offset)
  
  /*

- * It is expected that Xen won't be more then 2 MB.
+ * It is expected that Xen won't be more then XEN_VIRT_SIZE MB.

Why "MB" when the macro already expands to MB(16)?


It should be really dropped, no need for MB in the comment.




   * The check in xen.lds.S guarantees that.
- * At least 3 page tables (in case of Sv39 ) are needed to cover 2 MB.
- * One for each page level table with PAGE_SIZE = 4 Kb.
   *
- * One L0 page table can cover 2 MB(512 entries of one page table * PAGE_SIZE).
+ * Root page table is shared with the initial mapping and is declared
+ * separetely. (look at stage1_pgtbl_root)

Nit: separately


   *
- * It might be needed one more page table in case when Xen load address
- * isn't 2 MB aligned.
+ * An amount of page tables between root page table and L0 page table
+ * (in the case of Sv39 it covers L1 table):
+ *   (CONFIG_PAGING_LEVELS - 2) are needed for an identity mapping and
+ *   the same amount are needed for Xen.
   *
- * CONFIG_PAGING_LEVELS page tables are needed for the identity mapping,
- * except that the root page table is shared with the initial mapping
+ * An amount of L0 page tables:
+ *   (512 entries of one L0 page table covers 2MB == 1<> XEN_PT_LEVEL_SHIFT(1) are needed for Xen and
+ *   one L0 is needed for indenity mapping.

Nit: identity

But more importantly, where's this one L0 ...


   */
-#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) * 2 + 1)
+#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 2) * 2 + \
+ (XEN_VIRT_SIZE >> XEN_PT_LEVEL_SHIFT(1)))

 in this calculation?


L0 for identity mapping is really missed.

Thanks.

~ Oleksii

Re: [PATCH 3/5] x86/hvm: fix handling of accesses to partial r/o MMIO pages

2025-04-14 Thread Roger Pau Monné

On Mon, Apr 14, 2025 at 08:33:44AM +0200, Jan Beulich wrote:
> On 11.04.2025 12:54, Roger Pau Monne wrote:
> > The current logic to handle accesses to MMIO pages partially read-only is
> > based on the (now removed) logic used to handle accesses to the r/o MMCFG
> > region(s) for PVH v1 dom0.  However that has issues when running on AMD
> > hardware, as in that case the guest linear address that triggered the fault
> > is not provided as part of the VM exit.  This caused
> > mmio_ro_emulated_write() to always fail before calling
> > subpage_mmio_write_emulate() when running on AMD and called from an HVM
> > context.
> > 
> > Take a different approach and convert the handling of partial read-only
> > MMIO page accesses into an HVM MMIO ops handler, as that's the more natural
> > way to handle this kind of emulation for HVM domains.
> > 
> > This allows getting rid of hvm_emulate_one_mmio() and it's single cal site
> > in hvm_hap_nested_page_fault().
> > 
> > Note a small adjustment is needed to the `pf-fixup` dom0 PVH logic: avoid
> > attempting to fixup faults resulting from accesses to read-only MMIO
> > regions, as handling of those accesses is now done by handle_mmio().
> > 
> > Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in 
> > HVM containers')
> > Signed-off-by: Roger Pau Monné 
> > ---
> > The fixes tag is maybe a bit wonky, it's either this or:
> > 
> > 8847d6e23f97 ('x86/mm: add API for marking only part of a MMIO page read 
> > only')
> > 
> > However the addition of subpage r/o access handling to the existing
> > mmio_ro_emulated_write() function was done based on the assumption that the
> > current code was working - which turned out to not be the case for AMD,
> > hence my preference for blaming the commit that actually introduced the
> > broken logic.
> 
> I agree.
> 
> > ---
> >  xen/arch/x86/hvm/emulate.c | 47 +-
> >  xen/arch/x86/hvm/hvm.c | 89 +++---
> 
> It's quite a bit of rather special code you add to this catch-all file. How
> about introducing a small new one, e.g. mmio.c (to later maybe become home
> to some further stuff)?

Yes, I didn't find any suitable place, I was also considering
hvm/io.c or hvm/intercept.c, but none looked very good TBH.

The main benefit of placing it in hvm.c is that the functions can all
be static and confined to hvm.c as it's only user.

> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -370,7 +370,8 @@ static int hvmemul_do_io(
> >  /* If there is no suitable backing DM, just ignore accesses */
> >  if ( !s )
> >  {
> > -if ( is_mmio && is_hardware_domain(currd) )
> > +if ( is_mmio && is_hardware_domain(currd) &&
> > + !rangeset_contains_singleton(mmio_ro_ranges, 
> > PFN_DOWN(addr)) )
> 
> I think this needs a brief comment - it otherwise looks misplaced here.
> 
> > @@ -585,9 +585,81 @@ static int cf_check hvm_print_line(
> >  return X86EMUL_OKAY;
> >  }
> >  
> > +static int cf_check subpage_mmio_accept(struct vcpu *v, unsigned long addr)
> > +{
> > +p2m_type_t t;
> > +mfn_t mfn = get_gfn_query_unlocked(v->domain, addr, &t);
> > +
> > +return !mfn_eq(mfn, INVALID_MFN) && t == p2m_mmio_direct &&
> > +   !!subpage_mmio_find_page(mfn);
> 
> The !! isn't needed here, is it?

IIRC clang complained about conversion from pointer to integer without
a cast, but maybe that was before adding the further conditions here.

> > +}
> > +
> > +static int cf_check subpage_mmio_read(
> > +struct vcpu *v, unsigned long addr, unsigned int len, unsigned long 
> > *data)
> > +{
> > +struct domain *d = v->domain;
> > +p2m_type_t t;
> > +mfn_t mfn = get_gfn_query(d, addr, &t);
> > +struct subpage_ro_range *entry;
> > +volatile void __iomem *mem;
> > +
> > +*data = ~0UL;
> > +
> > +if ( mfn_eq(mfn, INVALID_MFN) || t != p2m_mmio_direct )
> > +{
> > +put_gfn(d, addr);
> > +return X86EMUL_RETRY;
> > +}
> > +
> > +entry = subpage_mmio_find_page(mfn);
> > +if ( !entry )
> > +{
> > +put_gfn(d, addr);
> > +return X86EMUL_RETRY;
> > +}
> > +
> > +mem = subpage_mmio_map_page(entry);
> > +if ( !mem )
> > +{
> > +put_gfn(d, addr);
> > +gprintk(XENLOG_ERR, "Failed to map page for MMIO read at %#lx\n",
> > +mfn_to_maddr(mfn) + PAGE_OFFSET(addr));
> 
> Makes me wonder whether the function parameter wouldn't better be named
> "curr" and the local variable "currd", to reflect that this log message
> will report appropriate context.

Sure, can adjust.

> Would also logging the guest physical address perhaps be worthwhile here?

Possibly, my first apporahc was to print gfn -> mfn, but ended up
copying the same message from  subpage_mmio_write_emulate() for
symmetry reasons.

> > +return X86EMUL_OKAY;
> > +}
> > +
> > +*data = read_mmio(mem + PA

[PATCH 07/11] drm/xen: Test for imported buffers with drm_gem_is_imported()

2025-04-14 Thread Thomas Zimmermann

Instead of testing import_attach for imported GEM buffers, invoke
drm_gem_is_imported() to do the test. The helper tests the dma_buf
itself while import_attach is just an artifact of the import. Prepares
to make import_attach optional.

Signed-off-by: Thomas Zimmermann 
Cc: Oleksandr Andrushchenko 
Cc: xen-devel@lists.xenproject.org
---
 drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 63112ed975c4..62a83c36fce8 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -203,7 +203,7 @@ void xen_drm_front_gem_free_object_unlocked(struct 
drm_gem_object *gem_obj)
 {
struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj);
 
-   if (xen_obj->base.import_attach) {
+   if (drm_gem_is_imported(&xen_obj->base)) {
drm_prime_gem_destroy(&xen_obj->base, xen_obj->sgt_imported);
gem_free_pages_array(xen_obj);
} else {
-- 
2.49.0

Re: [PATCH v2 0/5] Fix lazy mmu mode

2025-04-14 Thread Alexander Gordeev

On Mon, Apr 14, 2025 at 02:22:53PM +0100, Ryan Roberts wrote:
> On 10/04/2025 17:07, Alexander Gordeev wrote:
> >> I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As 
> >> part
> >> of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc 
> >> table
> >> walkers. While lazy mmu mode is already used for kernel mappings in a few
> >> places, this will extend it's use significantly.
> >>
> >> Having reviewed the existing lazy mmu implementations in powerpc, sparc 
> >> and x86,
> >> it looks like there are a bunch of bugs, some of which may be more likely 
> >> to
> >> trigger once I extend the use of lazy mmu.
> > 
> > Do you have any idea about generic code issues as result of not adhering to
> > the originally stated requirement:
> > 
> >   /*
> >...
> >* the PTE updates which happen during this window.  Note that using this
> >* interface requires that read hazards be removed from the code.  A read
> >* hazard could result in the direct mode hypervisor case, since the 
> > actual
> >* write to the page tables may not yet have taken place, so reads though
> >* a raw PTE pointer after it has been modified are not guaranteed to be
> >* up to date.
> >...
> >*/
> > 
> > I tried to follow few code paths and at least this one does not look so 
> > good:
> > 
> > copy_pte_range(..., src_pte, ...)
> > ret = copy_nonpresent_pte(..., src_pte, ...)
> > try_restore_exclusive_pte(..., src_pte, ...)// 
> > is_device_exclusive_entry(entry)
> > restore_exclusive_pte(..., ptep, ...)
> > set_pte_at(..., ptep, ...)
> > set_pte(ptep, pte); // save in lazy 
> > mmu mode
> > 
> > // ret == -ENOENT
> > 
> > ptent = ptep_get(src_pte);  // lazy mmu 
> > save is not observed
> > ret = copy_present_ptes(..., ptent, ...);   // wrong ptent 
> > used
> > 
> > I am not aware whether the effort to "read hazards be removed from the code"
> > has ever been made and the generic code is safe in this regard.
> > 
> > What is your take on this?
> 
> Hmm, that looks like a bug to me, at least based on the stated requirements.
> Although this is not a "read through a raw PTE *pointer*", it is a ptep_get().
> The arch code can override that so I guess it has an opportunity to flush. 
> But I
> don't think any arches are currently doing that.
> 
> Probably the simplest fix is to add arch_flush_lazy_mmu_mode() before the
> ptep_get()?

Which would completely revert the very idea of the lazy mmu mode?
(As one would flush on every PTE page table iteration).

> It won't be a problem in practice for arm64, since the pgtables are always
> updated immediately. I just want to use these hooks to defer/batch barriers in
> certain cases.
> 
> And this is a pre-existing issue for the arches that use lazy mmu with
> device-exclusive mappings, which my extending lazy mmu into vmalloc won't
> exacerbate.
> 
> Would you be willing/able to submit a fix?

Well, we have a dozen of lazy mmu cases and I would guess it is not the
only piece of code that seems affected. I was thinking about debug feature
that could help spotting all troubled locations.

Then we could assess and decide if it is feasible to fix. Just turning the
code above into the PTE read-modify-update pattern is quite an exercise...

> Thanks,
> Ryan

[PATCH AUTOSEL 6.6 06/24] xen: Change xen-acpi-processor dom0 dependency

2025-04-14 Thread Sasha Levin

From: Jason Andryuk 

[ Upstream commit 0f2946bb172632e122d4033e0b03f85230a29510 ]

xen-acpi-processor functions under a PVH dom0 with only a
xen_initial_domain() runtime check.  Change the Kconfig dependency from
PV dom0 to generic dom0 to reflect that.

Suggested-by: Jan Beulich 
Signed-off-by: Jason Andryuk 
Reviewed-by: Juergen Gross 
Tested-by: Jan Beulich 
Signed-off-by: Juergen Gross 
Message-ID: <20250331172913.51240-1-jason.andr...@amd.com>
Signed-off-by: Sasha Levin 
---
 drivers/xen/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index d43153fec18ea..af5c214b22069 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -278,7 +278,7 @@ config XEN_PRIVCMD_IRQFD
 
 config XEN_ACPI_PROCESSOR
tristate "Xen ACPI processor"
-   depends on XEN && XEN_PV_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
+   depends on XEN && XEN_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
default m
help
  This ACPI processor uploads Power Management information to the Xen
-- 
2.39.5

Re: [PATCH v3 09/16] x86/hyperlaunch: locate dom0 kernel with hyperlaunch

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 11:58 AM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> From: "Daniel P. Smith" 
>> 
>> Look for a subnode of type `multiboot,kernel` within a domain node. If
>> found, locate it using the multiboot module helper to generically ensure
>> it lives in the module list. If the bootargs property is present and
>> there was not an MB1 string, then use the command line from the device
>> tree definition.
>> 
>> Signed-off-by: Daniel P. Smith 
>> Signed-off-by: Jason Andryuk 
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v3:
>> * Add const to fdt
>> * Remove idx == NULL checks
>> * Add BUILD_BUG_ON for MAX_NR_BOOTMODS fitting in a uint32_t
>
> At least this one looks to rather belong into patch 08?

Urg, yes. Sorry. There was a lot of code motion when I factored out the
helpers.

>
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -155,6 +155,52 @@ int __init fdt_read_multiboot_module(const void *fdt, 
>> int node,
>>  return idx;
>>  }
>>  
>> +static int __init process_domain_node(
>> +struct boot_info *bi, const void *fdt, int dom_node)
>> +{
>> +int node;
>> +struct boot_domain *bd = &bi->domains[bi->nr_domains];
>> +const char *name = fdt_get_name(fdt, dom_node, NULL) ?: "unknown";
>> +int address_cells = fdt_address_cells(fdt, dom_node);
>> +int size_cells = fdt_size_cells(fdt, dom_node);
>
> Oh, okay - regarding my earlier comment elsewhere: If the sizes come from DT,
> then of course ASSERT_UNREACHABLE() can't be used at the place where bogus
> ones are rejected.
>
>> +fdt_for_each_subnode(node, fdt, dom_node)
>> +{
>> +if ( fdt_node_check_compatible(fdt, node, "multiboot,kernel") == 0 )
>> +{
>
> When the loop body is merely an if() with a non-negligible body, inverting
> the condition and using "continue" is usually better. Much like you do ...

This becomes a chain of if conditions later on, one per property.

>
>> +int idx;
>> +
>> +if ( bd->kernel )
>> +{
>> +printk(XENLOG_ERR "Duplicate kernel module for domain %s\n",
>> +   name);
>> +continue;
>
> ... here already.
>
> Jan

Cheers,
Alejandro

Re: [PATCH v3 2/7] arm/mpu: Provide access to the MPU region from the C code

2025-04-14 Thread Luca Fancellu

HI Julien,

> On 14 Apr 2025, at 12:41, Julien Grall  wrote:
> 
> Hi Luca,
> 
> On 11/04/2025 23:56, Luca Fancellu wrote:
>> Implement some utility function in order to access the MPU regions
>> from the C world.
>> Signed-off-by: Luca Fancellu 
>> ---
>> v3 changes:
>>  - Moved PRBAR0_EL2/PRLAR0_EL2 to arm64 specific
>>  - Modified prepare_selector() to be easily made a NOP
>>for Arm32, which can address up to 32 region without
>>changing selector and it is also its maximum amount
>>of MPU regions.
>> ---
>> ---
>>  xen/arch/arm/include/asm/arm64/mpu.h |   7 ++
>>  xen/arch/arm/include/asm/mpu.h   |   1 +
>>  xen/arch/arm/include/asm/mpu/mm.h|  24 +
>>  xen/arch/arm/mpu/mm.c| 125 +++
>>  4 files changed, 157 insertions(+)
>> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
>> b/xen/arch/arm/include/asm/arm64/mpu.h
>> index 4d2bd7d7877f..b4e1ecdf741d 100644
>> --- a/xen/arch/arm/include/asm/arm64/mpu.h
>> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
>> @@ -8,6 +8,13 @@
>>#ifndef __ASSEMBLY__
>>  +/*
>> + * The following are needed for the case generators 
>> GENERATE_WRITE_PR_REG_CASE
>> + * and GENERATE_READ_PR_REG_CASE with num==0
>> + */
>> +#define PRBAR0_EL2 PRBAR_EL2
>> +#define PRLAR0_EL2 PRLAR_EL2
> 
> Rather than aliasing, shouldn't we just rename PR{B,L}AR_EL2 to 
> PR{B,L}AR0_EL2? This would the code mixing between the two.

PR{B,L}AR0_ELx does not exists really, the PR{B,L}AR_ELx exists for n=1..15, 
here I’m only using this “alias” for the generator,
but PR{B,L}AR_EL2 are the real register.

> 
>> +
>>  /* Protection Region Base Address Register */
>>  typedef union {
>>  struct __packed {
>> diff --git a/xen/arch/arm/include/asm/mpu.h b/xen/arch/arm/include/asm/mpu.h
>> index e148c705b82c..59ff22c804c1 100644
>> --- a/xen/arch/arm/include/asm/mpu.h
>> +++ b/xen/arch/arm/include/asm/mpu.h
>> @@ -13,6 +13,7 @@
>>  #define MPU_REGION_SHIFT  6
>>  #define MPU_REGION_ALIGN  (_AC(1, UL) << MPU_REGION_SHIFT)
>>  #define MPU_REGION_MASK   (~(MPU_REGION_ALIGN - 1))
>> +#define MPU_REGION_RES0   (0xFFFULL << 52)
>>#define NUM_MPU_REGIONS_SHIFT   8
>>  #define NUM_MPU_REGIONS (_AC(1, UL) << NUM_MPU_REGIONS_SHIFT)
>> diff --git a/xen/arch/arm/include/asm/mpu/mm.h 
>> b/xen/arch/arm/include/asm/mpu/mm.h
>> index 86f33d9836b7..5cabe9d111ce 100644
>> --- a/xen/arch/arm/include/asm/mpu/mm.h
>> +++ b/xen/arch/arm/include/asm/mpu/mm.h
>> @@ -8,6 +8,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>extern struct page_info *frame_table;
>>  @@ -29,6 +30,29 @@ static inline struct page_info *virt_to_page(const void 
>> *v)
>>  return mfn_to_page(mfn);
>>  }
>>  +/* Utility function to be used whenever MPU regions are modified */
>> +static inline void context_sync_mpu(void)
>> +{
>> +/*
>> + * ARM DDI 0600B.a, C1.7.1
>> + * Writes to MPU registers are only guaranteed to be visible following a
>> + * Context synchronization event and DSB operation.
> 
> I know we discussed about this before. I find odd that the specification says 
> "context synchronization event and DSB operation". At least to me, it implies 
> "isb + dsb" not the other way around. Has this been clarified in newer 
> version of the specification?

unfortunately no, I’m looking into the latest one (Arm® Architecture Reference 
Manual Supplement Armv8, for R-profile AArch64 architecture 0600B.a) but it has 
the same wording, however
I spoke internally with Cortex-R architects and they told me to use DSB+ISB

> 
>> + */
>> +dsb(sy);
>> +isb();
>> +}
>> +
>> +/*
>> + * The following API require context_sync_mpu() after being used to modifiy 
>> MPU
> 
> typo: s/require/requires/ and s/modifiy/modify/
> 
>> + * regions:
>> + *  - write_protection_region
>> + */
>> +
>> +/* Reads the MPU region with index 'sel' from the HW */
>> +extern void read_protection_region(pr_t *pr_read, uint8_t sel);
> 
> I am probably missing something. But don't you have a copy of pr_t in 
> xen_mpumap? If so, can't we use the cached version to avoid accessing the 
> system registers?

This API is meant to read/write registers, last patch uses it to populate 
xen_mpumap, along the tree it is also used in dump_hyp_walk, probably given 
your comment to the
last patch, if we need to update the xen_mpumap from the asm code, this could 
change.

> 
>> +/* Writes the MPU region with index 'sel' to the HW */
>> +extern void write_protection_region(const pr_t *pr_write, uint8_t sel);
>> +
>>  #endif /* __ARM_MPU_MM_H__ */
>>/*
>> diff --git a/xen/arch/arm/mpu/mm.c b/xen/arch/arm/mpu/mm.c
>> index f83ce04fef8a..e522ce53c357 100644
>> --- a/xen/arch/arm/mpu/mm.c
>> +++ b/xen/arch/arm/mpu/mm.c
>> @@ -8,12 +8,30 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>>struct page_info *frame_table;
>>/* EL2 Xen MPU memory region mapping table. */
>>  pr_t xen_mpumap[MAX_MPU_REGIONS];
>>  +#define GENERATE_WRITE_PR_REG_CASE(num

Re: [PATCH v1 01/14] xen/riscv: implement get_s_time()

2025-04-14 Thread Oleksii Kurochko



On 4/10/25 2:52 PM, Jan Beulich wrote:

On 08.04.2025 17:57, Oleksii Kurochko wrote:

@@ -23,6 +24,11 @@ static inline cycles_t get_cycles(void)
  return csr_read(CSR_TIME);
  }
  
+static inline s_time_t ticks_to_ns(uint64_t ticks)

+{
+return muldiv64(ticks, SECONDS(1), 1000 * cpu_khz);
+}

Why the extra multiplication by 1000? I.e. why not
"muldiv64(ticks, MILLISECONDS(1), cpu_khz)", getting away with just one
multiplication and a reduced risk of encountering intermediate overflow
(affecting only hypothetical above 4THz CPUs then)?


Multiplication by 1000 was needed to convert khz to hz, but yes, your option
would be better.




--- a/xen/arch/riscv/time.c
+++ b/xen/arch/riscv/time.c
@@ -4,10 +4,17 @@
  #include 
  #include 
  #include 
+#include 
  
  unsigned long __ro_after_init cpu_khz; /* CPU clock frequency in kHz. */

  uint64_t __ro_after_init boot_clock_cycles;
  
+s_time_t get_s_time(void)

+{
+uint64_t ticks = get_cycles() - boot_clock_cycles;
+return ticks_to_ns(ticks);

Nit: Blank line between declaration(s) and statement(s) please, as well as
ahead of the main "return" of a function.

Happy to make both adjustments upon committing, so long as you agree; then:
Reviewed-by: Jan Beulich


I'll be happy with that.

Thank you very much.

~ Oleksii

Re: [PATCH 3/5] x86/hvm: fix handling of accesses to partial r/o MMIO pages

2025-04-14 Thread Jan Beulich

On 14.04.2025 15:53, Roger Pau Monné wrote:
> On Mon, Apr 14, 2025 at 08:33:44AM +0200, Jan Beulich wrote:
>> On 11.04.2025 12:54, Roger Pau Monne wrote:
>>> ---
>>>  xen/arch/x86/hvm/emulate.c | 47 +-
>>>  xen/arch/x86/hvm/hvm.c | 89 +++---
>>
>> It's quite a bit of rather special code you add to this catch-all file. How
>> about introducing a small new one, e.g. mmio.c (to later maybe become home
>> to some further stuff)?
> 
> Yes, I didn't find any suitable place, I was also considering
> hvm/io.c or hvm/intercept.c, but none looked very good TBH.
> 
> The main benefit of placing it in hvm.c is that the functions can all
> be static and confined to hvm.c as it's only user.

I understand that. Still, if we went by that criteria, we'd best put all of
our code in a single file ;-)

>>> +static int cf_check subpage_mmio_write(
>>> +struct vcpu *v, unsigned long addr, unsigned int len, unsigned long 
>>> data)
>>> +{
>>> +struct domain *d = v->domain;
>>> +p2m_type_t t;
>>> +mfn_t mfn = get_gfn_query(d, addr, &t);
>>> +
>>> +if ( mfn_eq(mfn, INVALID_MFN) || t != p2m_mmio_direct )
>>> +{
>>> +put_gfn(d, addr);
>>> +return X86EMUL_RETRY;
>>> +}
>>> +
>>> +subpage_mmio_write_emulate(mfn, PAGE_OFFSET(addr), data, len);
>>> +
>>> +put_gfn(d, addr);
>>> +return X86EMUL_OKAY;
>>> +}
>>
>> Unlike the read path this doesn't return RETRY when subpage_mmio_find_page()
>> fails (indicating something must have changed after "accept" said yes).
> 
> Yeah, I've noticed this, but didn't feel like modifying
> subpage_mmio_write_emulate() for this.  The list of partial r/o MMIO
> pages cannot change after system boot, so accept returning yes and not
> finding a page here would likely warrant an ASSERT_UNRECHABLE().
> 
> Would you like me to modify subpage_mmio_write_emulate to make it
> return an error code?

I'd be happy with the two paths being in sync in whichever way that's
achieved. The argument you give equally holds for the read path, after
all.

>>> @@ -1981,7 +2056,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
>>> long gla,
>>>   */
>>>  if ( (p2mt == p2m_mmio_dm) ||
>>>   (npfec.write_access &&
>>> -  (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
>>> +  (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server) ||
>>> +   /* MMIO entries can be r/o if the target mfn is in 
>>> mmio_ro_ranges. */
>>> +   (p2mt == p2m_mmio_direct))) )
>>>  {
>>>  if ( !handle_mmio_with_translation(gla, gfn, npfec) )
>>>  hvm_inject_hw_exception(X86_EXC_GP, 0);
>>
>> Aren't we handing too many things to handle_mmio_with_translation() this
>> way? At the very least you're losing ...
>>
>>> @@ -2033,14 +2110,6 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
>>> long gla,
>>>  goto out_put_gfn;
>>>  }
>>>  
>>> -if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present 
>>> &&
>>
>> ... the .present check.
> 
> Isn't the p2mt == p2m_mmio_direct check already ensuring the entry is
> present?  Otherwise it's type would be p2m_invalid or p2m_mmio_dm?

Yes (to the 1st question), it kind of is.

> It did seem to me the other checks in this function already assume
> that by having a valid type the entry is present.

Except for the code above, where we decided to play safe. AT the very least
if you drop such a check, please say a justifying word in the description.

>> I'm also concerned of e.g. VT-x'es APIC access MFN, which is
>> p2m_mmio_direct.
> 
> But that won't go into hvm_hap_nested_page_fault() when using
> cpu_has_vmx_virtualize_apic_accesses (and thus having an APIC page
> mapped as p2m_mmio_direct)?
> 
> It would instead be an EXIT_REASON_APIC_ACCESS vmexit which is handled
> differently?

All true as long as things work as expected (potentially including the guest
also behaving as expected). Also this was explicitly only an example I could
readily think of. I'm simply wary of handle_mmio_with_translation() now
getting things to handle it's not meant to ever see.

Jan

[PATCH v2 2/8] xen/common: dom0less: make some parts of Arm's CONFIG_DOM0LESS common

2025-04-14 Thread Oleksii Kurochko

Move some parts of Arm's Dom0Less code to be reused by other architectures.
At the moment, RISC-V is going to reuse these parts.

Move dom0less-build.h from the Arm-specific directory to asm-generic
as these header is expected to be the same across acrhictectures with
some updates: add the following declaration of construct_domU(),
arch_xen_domctl_createdomain() and arch_create_domus() as there are
some parts which are still architecture-specific.

Introduce HAS_DOM0LESS to provide ability to enable generic Dom0less
code for an architecture.

Relocate the CONFIG_DOM0LESS configuration to the common with adding
"depends on HAS_DOM0LESS" to not break builds for architectures which
don't support CONFIG_DOM0LESS config, especically it would be useful
to not provide stubs for  construct_domU(), arch_xen_domctl_createdomain()
and arch_create_domus() in case of *-randconfig which may set
CONFIG_DOM0LESS=y.

Move is_dom0less_mode() function to the common code, as it depends on
boot modules that are already part of the common code.

Move create_domUs() function to the common code with some updates:
- Add function arch_xen_domctl_createdomain() as structure
  xen_domctl_createdomain may have some arch-spicific information and
  initialization.
- Add arch_create_domus() to cover parsing of arch-specific features,
  for example, SVE (Scalar Vector Extension ) exists only in Arm.

Signed-off-by: Oleksii Kurochko 
---
Changes in v2:
 - Convert 'depends on Arm' to 'depends on HAS_DOM0LESS' for
   CONFIG_DOM0LESS_BOOT.
 - Change 'default Arm' to 'default y' for CONFIG_DOM0LESS_BOOT as there is
   dependency on HAS_DOM0LESS.
 - Introduce HAS_DOM0LESS and enable it for Arm.
 - Update the commit message.
---
 xen/arch/arm/Kconfig  |   9 +-
 xen/arch/arm/dom0less-build.c | 270 ++
 xen/arch/arm/include/asm/Makefile |   1 +
 xen/arch/arm/include/asm/dom0less-build.h |  32 ---
 xen/common/Kconfig|  12 +
 xen/common/device-tree/Makefile   |   1 +
 xen/common/device-tree/dom0less-build.c   | 161 +
 xen/include/asm-generic/dom0less-build.h  |  40 
 8 files changed, 287 insertions(+), 239 deletions(-)
 delete mode 100644 xen/arch/arm/include/asm/dom0less-build.h
 create mode 100644 xen/common/device-tree/dom0less-build.c
 create mode 100644 xen/include/asm-generic/dom0less-build.h

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 565f288331..060389c3c8 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -15,6 +15,7 @@ config ARM
select GENERIC_UART_INIT
select HAS_ALTERNATIVE if HAS_VMAP
select HAS_DEVICE_TREE
+   select HAS_DOM0LESS
select HAS_UBSAN
 
 config ARCH_DEFCONFIG
@@ -119,14 +120,6 @@ config GICV2
  Driver for the ARM Generic Interrupt Controller v2.
  If unsure, say Y
 
-config DOM0LESS_BOOT
-   bool "Dom0less boot support" if EXPERT
-   default y
-   help
- Dom0less boot support enables Xen to create and start domU guests 
during
- Xen boot without the need of a control domain (Dom0), which could be
- present anyway.
-
 config GICV3
bool "GICv3 driver"
depends on !NEW_VGIC
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index bd15563750..7ec3f85795 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -20,38 +20,6 @@
 #include 
 #include 
 
-bool __init is_dom0less_mode(void)
-{
-struct bootmodules *mods = &bootinfo.modules;
-struct bootmodule *mod;
-unsigned int i;
-bool dom0found = false;
-bool domUfound = false;
-
-/* Look into the bootmodules */
-for ( i = 0 ; i < mods->nr_mods ; i++ )
-{
-mod = &mods->module[i];
-/* Find if dom0 and domU kernels are present */
-if ( mod->kind == BOOTMOD_KERNEL )
-{
-if ( mod->domU == false )
-{
-dom0found = true;
-break;
-}
-else
-domUfound = true;
-}
-}
-
-/*
- * If there is no dom0 kernel but at least one domU, then we are in
- * dom0less mode
- */
-return ( !dom0found && domUfound );
-}
-
 #ifdef CONFIG_VGICV2
 static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
 {
@@ -869,8 +837,8 @@ static inline int domain_p2m_set_allocation(struct domain 
*d, uint64_t mem,
 }
 #endif /* CONFIG_ARCH_PAGING_MEMPOOL */
 
-static int __init construct_domU(struct domain *d,
- const struct dt_device_node *node)
+int __init construct_domU(struct domain *d,
+  const struct dt_device_node *node)
 {
 struct kernel_info kinfo = KERNEL_INFO_INIT;
 const char *dom0less_enhanced;
@@ -965,188 +933,92 @@ static int __init construct_domU(struct domain *d,
 return alloc_xenstore_params(&kinfo);
 }
 
-void __init create_domUs(void)
-{
-struct dt_devi

[PATCH v2 6/8] xen/common: dom0less: introduce common kernel.c

2025-04-14 Thread Oleksii Kurochko

The following functions don't have arch specific things so it is moved to
common:
- kernel_prboe()
- kernel_load()
- output_length()

Functions necessary for dom0less are only moved.

The following changes are done:
- Swap __init and return type of kernel_decompress() function to be
  consistent with defintions of functions in other files. The same
  for output_length().
- Wrap by "ifdef CONFIG_ARM" the call of kernel_uimage_probe() in
  kernel_probe() as uImage isn't really used nowadays thereby leave
  kernel_uimage_probe() call here just for compatability with Arm code.
- Introduce kernel_zimage_probe() to cover the case that arch can have
  different zimage header.
- Add ASSERT() for kernel_load() to check that it argument isn't NULL.
- Make kernel_uimage_probe() non-static in Arm's code as it is used in
  common/kernel.c.

Introduce CONFIG_DOMAIN_BUILD_HELPERS to not provide stubs for archs
which don't provide enough functionality to enable it.
Select CONFIG_DOMAIN_BUILD_HELPERS for CONFIG_ARM as only Arm supports
it, at the moment.

Signed-off-by: Oleksii Kurochko 
---
Change in v2:
 - Drop inclusion of asm/kernel.h in kernel.c as everything necessary has
   been moved to xen/fdt-kernel.h.
---
 xen/arch/arm/Kconfig|   1 +
 xen/arch/arm/kernel.c   | 221 +
 xen/common/Kconfig  |   9 +-
 xen/common/device-tree/Makefile |   1 +
 xen/common/device-tree/kernel.c | 242 
 xen/include/xen/fdt-kernel.h|  13 ++
 6 files changed, 271 insertions(+), 216 deletions(-)
 create mode 100644 xen/common/device-tree/kernel.c

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 060389c3c8..d63c0dc669 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -11,6 +11,7 @@ config ARM_64
 
 config ARM
def_bool y
+   select DOMAIN_BUILD_HELPERS
select FUNCTION_ALIGNMENT_4B
select GENERIC_UART_INIT
select HAS_ALTERNATIVE if HAS_VMAP
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 164f417e75..1a5ae4b95c 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -161,105 +161,6 @@ static void __init kernel_zimage_load(struct kernel_info 
*info)
 iounmap(kernel);
 }
 
-static __init uint32_t output_length(char *image, unsigned long image_len)
-{
-return *(uint32_t *)&image[image_len - 4];
-}
-
-static __init int kernel_decompress(struct bootmodule *mod, uint32_t offset)
-{
-char *output, *input;
-char magic[2];
-int rc;
-unsigned int kernel_order_out;
-paddr_t output_size;
-struct page_info *pages;
-mfn_t mfn;
-int i;
-paddr_t addr = mod->start;
-paddr_t size = mod->size;
-
-if ( size < offset )
-return -EINVAL;
-
-/*
- * It might be that gzip header does not appear at the start address
- * (e.g. in case of compressed uImage) so take into account offset to
- * gzip header.
- */
-addr += offset;
-size -= offset;
-
-if ( size < 2 )
-return -EINVAL;
-
-copy_from_paddr(magic, addr, sizeof(magic));
-
-/* only gzip is supported */
-if ( !gzip_check(magic, size) )
-return -EINVAL;
-
-input = ioremap_cache(addr, size);
-if ( input == NULL )
-return -EFAULT;
-
-output_size = output_length(input, size);
-kernel_order_out = get_order_from_bytes(output_size);
-pages = alloc_domheap_pages(NULL, kernel_order_out, 0);
-if ( pages == NULL )
-{
-iounmap(input);
-return -ENOMEM;
-}
-mfn = page_to_mfn(pages);
-output = vmap_contig(mfn, 1 << kernel_order_out);
-
-rc = perform_gunzip(output, input, size);
-clean_dcache_va_range(output, output_size);
-iounmap(input);
-vunmap(output);
-
-if ( rc )
-{
-free_domheap_pages(pages, kernel_order_out);
-return rc;
-}
-
-mod->start = page_to_maddr(pages);
-mod->size = output_size;
-
-/*
- * Need to free pages after output_size here because they won't be
- * freed by discard_initial_modules
- */
-i = PFN_UP(output_size);
-for ( ; i < (1 << kernel_order_out); i++ )
-free_domheap_page(pages + i);
-
-/*
- * When using static heap feature, don't give bootmodules memory back to
- * the heap allocator
- */
-if ( using_static_heap )
-return 0;
-
-/*
- * When freeing the kernel, we need to pass the module start address and
- * size as they were before taking an offset to gzip header into account,
- * so that the entire region will be freed.
- */
-addr -= offset;
-size += offset;
-
-/*
- * Free the original kernel, update the pointers to the
- * decompressed kernel
- */
-fw_unreserved_regions(addr, addr + size, init_domheap_pages, 0);
-
-return 0;
-}
-
 /*
  * Uimage CPU Architecture Codes
  */
@@ -272,8 +173,8 @@ static __init int kernel_decompress(struct bootmodule *mod, 
uint32_t offset)
 /*
  * Check if

[PATCH v2 1/8] xen/arm: drop declaration of handle_device_interrupts()

2025-04-14 Thread Oleksii Kurochko

There is no any users of handle_device_interrupts() thereby it
could be dropped.

Signed-off-by: Oleksii Kurochko 
---
Changes in V2:
- New patch.
---
 xen/arch/arm/include/asm/domain_build.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/xen/arch/arm/include/asm/domain_build.h 
b/xen/arch/arm/include/asm/domain_build.h
index 134290853c..38546de477 100644
--- a/xen/arch/arm/include/asm/domain_build.h
+++ b/xen/arch/arm/include/asm/domain_build.h
@@ -27,17 +27,6 @@ void evtchn_allocate(struct domain *d);
 
 unsigned int get_allocation_size(paddr_t size);
 
-/*
- * handle_device_interrupts retrieves the interrupts configuration from
- * a device tree node and maps those interrupts to the target domain.
- *
- * Returns:
- *   < 0 error
- *   0   success
- */
-int handle_device_interrupts(struct domain *d, struct dt_device_node *dev,
- bool need_mapping);
-
 /*
  * Helper to write an interrupts with the GIC format
  * This code is assuming the irq is an PPI.
-- 
2.49.0

Re: [PATCH v3 1/6] CI: Rename intermediate artefacts in qemu-* scripts

2025-04-14 Thread Anthony PERARD

On Mon, Apr 14, 2025 at 12:08:58PM +0100, Andrew Cooper wrote:
> Right now, we have initrd.cpio.gz as domU, and initrd.tar.gz as the base for
> dom0.
> 
> Rename initrd.cpio.gz to domU-rootfs.cpio.gz, and xen-rootfs.cpio.gz to
> dom0-rootfs.cpio.gz to make it clearer which is which.  Rename the VM from
> test to domU.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

This also rename some long option into short options without an
explanation, but the change looks fine. (I usually prefer long options
in scripts because it means you don't need to check `man` to figure out
what a command line does.)

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 12/16] x86/hyperlaunch: add domain id parsing to domain config

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 12:49 PM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> From: "Daniel P. Smith" 
>> 
>> Introduce the ability to specify the desired domain id for the domain
>> definition. The domain id will be populated in the domid property of the
>> domain
>> node in the device tree configuration.
>
> Nit: Odd splitting of lines.

Fixed

>
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -8,6 +8,7 @@
>>  #include 
>>  
>>  #include 
>> +#include 
>
> What is this needed for?

get_initial_domain_id(), but that ought to come from xen/domain.h instead.

Fixed.

>
>> @@ -158,12 +159,42 @@ int __init fdt_read_multiboot_module(const void *fdt, 
>> int node,
>>  static int __init process_domain_node(
>>  struct boot_info *bi, const void *fdt, int dom_node)
>>  {
>> -int node;
>> +int node, property;
>>  struct boot_domain *bd = &bi->domains[bi->nr_domains];
>>  const char *name = fdt_get_name(fdt, dom_node, NULL) ?: "unknown";
>>  int address_cells = fdt_address_cells(fdt, dom_node);
>>  int size_cells = fdt_size_cells(fdt, dom_node);
>>  
>> +fdt_for_each_property_offset(property, fdt, dom_node)
>> +{
>> +const struct fdt_property *prop;
>> +const char *prop_name;
>> +int name_len;
>> +
>> +prop = fdt_get_property_by_offset(fdt, property, NULL);
>> +if ( !prop )
>> +continue; /* silently skip */
>> +
>> +prop_name = fdt_get_string(fdt, fdt32_to_cpu(prop->nameoff), 
>> &name_len);
>> +
>> +if ( strncmp(prop_name, "domid", name_len) == 0 )
>> +{
>> +uint32_t val = DOMID_INVALID;
>> +if ( fdt_prop_as_u32(prop, &val) != 0 )
>
> Nit: Blank line please between declaration(s) and statement(s).

Ack.

>
>> +{
>> +printk("  failed processing domain id for domain %s\n", 
>> name);
>> +return -EINVAL;
>> +}
>> +if ( val >= DOMID_FIRST_RESERVED )
>> +{
>> +printk("  invalid domain id for domain %s\n", name);
>> +return -EINVAL;
>> +}
>> +bd->domid = (domid_t)val;
>
> And a conflict with other domains' IDs will not be complained about?

Hmmm... sure, I can iterate the domlist and check.

>
>> +printk("  domid: %d\n", bd->domid);
>
> If the error messages log "name" for (I suppose) disambiguation, why would
> the success message here not also do so?
>
>> @@ -233,6 +264,12 @@ static int __init process_domain_node(
>>  return -ENODATA;
>>  }
>>  
>> +if ( bd->domid == DOMID_INVALID )
>> +bd->domid = get_initial_domain_id();
>> +else if ( bd->domid != get_initial_domain_id() )
>> +printk(XENLOG_WARNING
>> +   "WARN: Booting without initial domid not supported.\n");
>
> I'm not a native speaker, but (or perhaps because of that) "without" feels
> wrong here.

It's probably the compound effect of without and "not supported". The
statement is correct, but it's arguably a bit obtuse.

I'll replace it with "WARN: Unsupported boot with missing initial domid.".

As for the first branch of that clause... I'm not sure we want to
support running with DTs that are missing the domid property.

>
> Also nit: No full stops please at the end of log messages, at least in the
> common case.

Ack

>
> Is the resolving of DOMID_INVALID invalid really needed both here and ...
>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1033,8 +1033,9 @@ static struct domain *__init create_dom0(struct 
>> boot_info *bi)
>>  if ( iommu_enabled )
>>  dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
>>  
>> -/* Create initial domain.  Not d0 for pvshim. */
>> -bd->domid = get_initial_domain_id();
>> +if ( bd->domid == DOMID_INVALID )
>> +/* Create initial domain.  Not d0 for pvshim. */
>> +bd->domid = get_initial_domain_id();
>
> ... here?

I'll rationatise all that on v4.

>
>> @@ -23,6 +24,16 @@ static inline uint64_t  __init fdt_cell_as_u64(const 
>> fdt32_t *cell)
>>  return ((uint64_t)fdt32_to_cpu(cell[0]) << 32) | fdt32_to_cpu(cell[1]);
>>  }
>>  
>> +static inline int __init fdt_prop_as_u32(
>> +const struct fdt_property *prop, uint32_t *val)
>> +{
>> +if ( !prop || fdt32_to_cpu(prop->len) < sizeof(u32) )
>> +return -EINVAL;
>> +
>> +*val = fdt_cell_as_u32((fdt32_t *)prop->data);
>> +return 0;
>> +}
>
> Path 08 looks to (partly) open-code this. Perhaps better to introduce already
> there?

Already done.

Cheers,
Alejandro

Re: [PATCH v3 3/6] CI: remove now unused alpine-3.18-arm64-rootfs job and its container

2025-04-14 Thread Anthony PERARD

On Mon, Apr 14, 2025 at 12:09:00PM +0100, Andrew Cooper wrote:
> From: Marek Marczykowski-Górecki 
> 
> This got moved to test-artifacts.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> Reviewed-by: Andrew Cooper 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 11/16] x86/hyperlaunch: locate dom0 initrd with hyperlaunch

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 12:34 PM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -195,6 +195,35 @@ static int __init process_domain_node(
>>   !((char *)__va(bd->kernel->cmdline_pa))[0] )
>>  bd->kernel->fdt_cmdline = fdt_get_prop_offset(
>>  fdt, node, "bootargs", &bd->kernel->cmdline_pa);
>> +
>> +continue;
>
> With this ...
>
>> +}
>> +else if ( fdt_node_check_compatible(fdt, node,
>
> ... no need for "else" here?

Sure

>
>> +"multiboot,ramdisk") == 0 )
>> +{
>> +int idx;
>> +
>> +if ( bd->module )
>> +{
>> +printk(XENLOG_ERR "Duplicate ramdisk module for domain 
>> %s)\n",
>
> Stray ')' in the string literal.

Ack.

>
>> +   name);
>> +continue;
>> +}
>> +
>> +idx = fdt_read_multiboot_module(fdt, node, address_cells,
>> +size_cells,bi);
>> +if ( idx < 0 )
>> +{
>> +printk("  failed processing ramdisk module for domain %s\n",
>> +   name);
>> +return -EINVAL;
>> +}
>
> Along the lines of what Denis has said - please be consistent about log
> messages: XENLOG_* or not, preferably no capital at the start, initial
> blank padding. May apply elsewhere in the series as well.

I don't mind dropping that and making everything flat (uppercase + no
padding), but there is some consistency. Albeit, it is true the
rationale is somewhat obscure.

ATM the consistency is: "padding spaces + lowercase" when giving extra
information on hyperlaunch. It ends up creating a hyperlaunch block in
`dmesg` with a "Hyperlaunch detected" line on top so it's easier to
know what lines are hyperlaunch related and which ones aren't.

Do you have a preference for a specific reporting style?

>
>> +printk("  ramdisk: boot module %d\n", idx);
>> +bi->mods[idx].type = BOOTMOD_RAMDISK;
>> +bd->module = &bi->mods[idx];
>
> The field's named "module" now, but that now ends up inconsistent with
> naming used elsewhere, as is pretty noticeable here.

Well, yes. It is confusing. Also, the DTB is called multiboot,ramdisk,
because multiboot,module is already used to detect what nodes are
expressed as multiboot,modules. I'm considering going back and calling
them ramdisk again. If anything, to avoid the ambiguity between
domain modules and multiboot modules. e.g: a kernel is a multiboot
module, but not a domain module.

>
>> +continue;
>
> This isn't strictly needed, is it, ...
>
>>  }
>>  }
>
> ... considering we're at the bottom of the loop?

Indeed

>
> Jan

Cheers,
Alejandro

Re: [PATCH v3 4/6] CI: Switch to new argo artefact

2025-04-14 Thread Andrew Cooper

On 14/04/2025 6:24 pm, Anthony PERARD wrote:
> On Mon, Apr 14, 2025 at 12:09:01PM +0100, Andrew Cooper wrote:
>> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
>> index 51229cbe561d..d46da1c43d05 100644
>> --- a/automation/gitlab-ci/test.yaml
>> +++ b/automation/gitlab-ci/test.yaml
>> @@ -242,7 +242,7 @@ xilinx-smoke-dom0-x86_64-gcc-debug-argo:
>>needs:
>>  - alpine-3.18-gcc-debug
>>  - project: xen-project/hardware/test-artifacts
>> -  job: x86_64-kernel-linux-6.6.56
>> +  job: linux-6.6.56-x86_64
>>ref: master
>>  - project: xen-project/hardware/test-artifacts
>>job: alpine-3.18-x86_64-rootfs
>
> Don't you need to remove the dependency on "x86_64-argo-linux-6.6.56"
> which is just out of context, as I think this is now part of
> "linux-6.6.56-x86_64" job.

Yes.  Sorry, this was a bad rebase taking out my "ref:
andrewcoop-test"'s through the series.

>
> Besides that:
> Reviewed-by: Anthony PERARD 

Thanks.

~Andrew

[PATCH v2 5/8] asm-generic: move some parts of Arm's domain_build.h to common

2025-04-14 Thread Oleksii Kurochko

Nothing changed. Only some functions declaration are moved to xen/include/
headers as they are expected to be used by common code of domain builing
or dom0less.

Signed-off-by: Oleksii Kurochko 
---
 Chnages in v2:
  - Add missed declaration of construct_hwdom().
  - Drop unnessary blank line.
  - Introduce xen/fdt-domain-build.h and move parts of Arm's domain_build.h to
it.
  - Update the commit message.
---
 xen/arch/arm/acpi/domain_build.c|  2 +-
 xen/arch/arm/dom0less-build.c   |  2 +-
 xen/arch/arm/domain_build.c |  2 +-
 xen/arch/arm/include/asm/domain_build.h | 18 +-
 xen/arch/arm/kernel.c   |  2 +-
 xen/arch/arm/static-shmem.c |  2 +-
 xen/include/xen/fdt-domain-build.h  | 46 +
 7 files changed, 52 insertions(+), 22 deletions(-)
 create mode 100644 xen/include/xen/fdt-domain-build.h

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index f9ca8b47e5..2b0768b7d5 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -19,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 122739061c..ca78cff655 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,7 +18,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index a19914f836..75f048f58c 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -30,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
diff --git a/xen/arch/arm/include/asm/domain_build.h 
b/xen/arch/arm/include/asm/domain_build.h
index 7136857ce4..5f9b063be1 100644
--- a/xen/arch/arm/include/asm/domain_build.h
+++ b/xen/arch/arm/include/asm/domain_build.h
@@ -5,27 +5,11 @@
 #include 
 
 typedef __be32 gic_interrupt_t[3];
-typedef bool (*alloc_domheap_mem_cb)(struct domain *d, struct page_info *pg,
- unsigned int order, void *extra);
-bool allocate_domheap_memory(struct domain *d, paddr_t tot_size,
- alloc_domheap_mem_cb cb, void *extra);
-bool allocate_bank_memory(struct kernel_info *kinfo, gfn_t sgfn,
-  paddr_t tot_size);
-void allocate_memory(struct domain *d, struct kernel_info *kinfo);
-int construct_domain(struct domain *d, struct kernel_info *kinfo);
-int construct_hwdom(struct kernel_info *kinfo);
 int domain_fdt_begin_node(void *fdt, const char *name, uint64_t unit);
-int make_chosen_node(const struct kernel_info *kinfo);
-int make_cpus_node(const struct domain *d, void *fdt);
-int make_hypervisor_node(struct domain *d, const struct kernel_info *kinfo,
- int addrcells, int sizecells);
-int make_memory_node(const struct kernel_info *kinfo, int addrcells,
- int sizecells, const struct membanks *mem);
 int make_psci_node(void *fdt);
-int make_timer_node(const struct kernel_info *kinfo);
 void evtchn_allocate(struct domain *d);
 
-unsigned int get_allocation_size(paddr_t size);
+int construct_hwdom(struct kernel_info *kinfo);
 
 /*
  * Helper to write an interrupts with the GIC format
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 5482cf4239..164f417e75 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -6,6 +6,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,7 +18,6 @@
 #include 
 
 #include 
-#include 
 #include 
 
 #define UIMAGE_MAGIC  0x27051956
diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 14ae48fb1e..07ebd8b41f 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -1,11 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-#include 
 #include 
 #include 
 #include 
diff --git a/xen/include/xen/fdt-domain-build.h 
b/xen/include/xen/fdt-domain-build.h
new file mode 100644
index 00..41454e75ca
--- /dev/null
+++ b/xen/include/xen/fdt-domain-build.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __XEN_FDT_DOMAIN_BUILD_H__
+#define __XEN_FDT_DOMAIN_BUILD_H__
+
+#include 
+#include 
+#include 
+#include 
+
+#if __has_include()
+#   include 
+#endif
+
+struct domain;
+struct page_info;
+struct membanks;
+
+typedef bool (*alloc_domheap_mem_cb)(struct domain *d, struct page_info *pg,
+

[PATCH v2 3/8] asm-generic: move parts of Arm's asm/kernel.h to common code

2025-04-14 Thread Oleksii Kurochko

Move the following parts to common with the following changes:
- struct kernel_info:
  - Create arch_kernel_info for arch specific kernel information.
At the moment, it contains domain_type for Arm.
  - Rename vpl011 to vuart to have more generic name suitable for other archs.
  - s/phandle_gic/phandle_intc to have more generic name suitable for other
archs.
  - Make text_offset of zimage structure available for RISCV_64.
- Wrap by `#ifdef KERNEL_INFO_SHM_MEM_INIT` definition of KERNEL_SHM_MEM_INIT
  and wrap by `#ifndef KERNEL_INFO_INIT` definition of KERNEL_INFO_INIT to have
  ability to override KERNEL_INFO_SHM_MEM_INIT for arch in case it doesn't
  want to use generic one.
- Move DOM0LESS_* macros to dom0less-build.h.
- Move all others parts of Arm's kernel.h to xen/fdt-kernel.h.

Because of the changes in struct kernel_info the correspondent parts of Arm's
code are updated.

As part of this patch the following clean up happens:
- Drop asm/setup.h from asm/kernel.h as nothing depends from it.
  Add inclusion of asm/setup.h for a code which uses device_tree_get_reg() to
  avoid compilation issues for CONFIG_STATIC_MEMORY and CONFIG_STATIC_SHM.
- Drop inclusion of asm/kernel.h everywhere except xen/fdt-kernel.h.

Signed-off-by: Oleksii Kurochko 
---
Changes in v2:
 - Introduce xen/fdt-kernel.h.
 - Move DOM0LESS_* macros to dom0less-build.h.
 - Move the rest in asm-generic/kernel.h to xen/fdt-kernel.h.
 - Drop inclusion of asm/kernel.h everywhere except xen/fdt-kernel.h.
 - Wrap by #if __has_include() the member of kernel_info structure:
 struct arch_kernel_info arch.
 - Update the commit message.
---
 xen/arch/arm/acpi/domain_build.c |   2 +-
 xen/arch/arm/dom0less-build.c|  29 ++---
 xen/arch/arm/domain_build.c  |  12 +-
 xen/arch/arm/include/asm/domain_build.h  |   2 +-
 xen/arch/arm/include/asm/kernel.h| 126 +
 xen/arch/arm/include/asm/static-memory.h |   2 +-
 xen/arch/arm/include/asm/static-shmem.h  |   2 +-
 xen/arch/arm/kernel.c|  13 ++-
 xen/arch/arm/static-memory.c |   1 +
 xen/arch/arm/static-shmem.c  |   1 +
 xen/common/device-tree/dt-overlay.c  |   2 +-
 xen/include/asm-generic/dom0less-build.h |  28 +
 xen/include/xen/fdt-kernel.h | 133 +++
 13 files changed, 198 insertions(+), 155 deletions(-)
 create mode 100644 xen/include/xen/fdt-kernel.h

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index 2ce75543d0..f9ca8b47e5 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 /* Override macros from asm/page.h to make them work with mfn_t */
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 7ec3f85795..5810083951 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,11 +61,11 @@ static int __init make_gicv2_domU_node(struct kernel_info 
*kinfo)
 if (res)
 return res;
 
-res = fdt_property_cell(fdt, "linux,phandle", kinfo->phandle_gic);
+res = fdt_property_cell(fdt, "linux,phandle", kinfo->phandle_intc);
 if (res)
 return res;
 
-res = fdt_property_cell(fdt, "phandle", kinfo->phandle_gic);
+res = fdt_property_cell(fdt, "phandle", kinfo->phandle_intc);
 if (res)
 return res;
 
@@ -131,11 +132,11 @@ static int __init make_gicv3_domU_node(struct kernel_info 
*kinfo)
 if (res)
 return res;
 
-res = fdt_property_cell(fdt, "linux,phandle", kinfo->phandle_gic);
+res = fdt_property_cell(fdt, "linux,phandle", kinfo->phandle_intc);
 if (res)
 return res;
 
-res = fdt_property_cell(fdt, "phandle", kinfo->phandle_gic);
+res = fdt_property_cell(fdt, "phandle", kinfo->phandle_intc);
 if (res)
 return res;
 
@@ -196,7 +197,7 @@ static int __init make_vpl011_uart_node(struct kernel_info 
*kinfo)
 return res;
 
 res = fdt_property_cell(fdt, "interrupt-parent",
-kinfo->phandle_gic);
+kinfo->phandle_intc);
 if ( res )
 return res;
 
@@ -482,10 +483,10 @@ static int __init domain_handle_dtb_bootmodule(struct 
domain *d,
  */
 if ( dt_node_cmp(name, "gic") == 0 )
 {
-uint32_t phandle_gic = fdt_get_phandle(pfdt, node_next);
+uint32_t phandle_intc = fdt_get_phandle(pfdt, node_next);
 
-if ( phandle_gic != 0 )
-kinfo->phandle_gic = phandle_gic;
+if ( phandle_intc != 0 )
+kinfo->phandle_intc = phandle_intc;
 continue;
 }
 
@@ -528,7 +529,7 @@ static int __init

[PATCH v4] xen/riscv: Increase XEN_VIRT_SIZE

2025-04-14 Thread Oleksii Kurochko

A randconfig job failed with the following issue:
  riscv64-linux-gnu-ld: Xen too large for early-boot assumptions

The reason is that enabling the UBSAN config increased the size of
the Xen binary.

Increase XEN_VIRT_SIZE to reserve enough space, allowing both UBSAN
and GCOV to be enabled together, with some slack for future growth.

Additionally, add checks to verify that XEN_VIRT_START is 1GB-aligned
and XEN_VIRT_SIZE is 2MB-aligned to reduce the number of page tables
needed for the initial mapping. In the future, when 2MB mappings are
used for .text (rx), .rodata (r), and .data (rw), this will also help
reduce TLB pressure.

Reported-by: Andrew Cooper 
Signed-off-by: Oleksii Kurochko 
---
Changes in v4:
 - Move is_init_section() to xen/sections.h. Add const for
   declaration of `p` variable inside is_init_section() and
   for the cast.
 - Update the comment above ASSERT() with .init* section range:
   s/[__init_begin, __init_end]/[__init_begin, __init_end).
 - Update ASSERT condition:
   s/"system_state != SYS_STATE_active"/"system_state < SYS_STATE_active".
 - Drop MB after XEN_VIRT_SIZE in the comment above PGTBL_INITIAL_COUNT
   as XEN_VIRT_SIZE expands to MB(16).
 - Fix typos:
   s/separetely/separately
   s/indenity/identity
 - Add lost L0 table for identity mapping to PGTBL_INITIAL_COUNT.
 - Move checks to alignment checks of XEN_VIRT_SIZE and XEN_VIRT_SIZE
   closer to the definition of PGTBL_INITIAL_COUNT.
 - Update the commit message.
---
Changes in v3:
 - Add ASSERT which checks .init* sections range. When Xen ends boot
   init* sections are going to be released.
 - Introduce is_init_section() macros.
 - Correct fixmap end address in RISCV-64 layour table.
 - Update ASSERT() which checks that `va` is in Xen virtual address
   range and drop BUILD_BUG_ON() as it isn't necessary anymore with
   the way how the ASSERT() looks now.
 - Add ASSERT() which checks that XEN_VIRT_START is 1gb aligned and
   add ASSERT() which checks that XEN_VIRT_SIZE is 2mb aligned.
   It helps us to reduce an amount of PGTBL_INITIAL_COUNT.
 - Update PGTBL_INITIAL_COUNT and the comment above.
 - Update the commit message.
---
Changes in v2:
 - Incerease XEN_VIRT_SIZE to 16 Mb to cover also the case if 2M mappings will
   be used for .text (rx), .rodata(r), and .data (rw).
 - Update layout table in config.h.
 - s/xen_virt_starn_vpn/xen_virt_start_vpn
 - Update BUILD_BUG_ON(... != MB(8)) check to "... > GB(1)".
 - Update definition of PGTBL_INITIAL_COUNT and the comment above.
---
 xen/arch/riscv/include/asm/config.h |  8 
 xen/arch/riscv/include/asm/mm.h | 15 ---
 xen/arch/riscv/mm.c | 28 +++-
 xen/include/xen/sections.h  |  4 
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/xen/arch/riscv/include/asm/config.h 
b/xen/arch/riscv/include/asm/config.h
index 7141bd9e46..5eba626f27 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -41,11 +41,11 @@
  * Start addr  | End addr | Slot   | area description
  * 
  *   . L2 511  Unused
- *  0xc0a0  0xc0bf L2 511  Fixmap
+ *  0xc180  0xc19f L2 511  Fixmap
  *   . ( 2 MB gap )
- *  0xc040  0xc07f L2 511  FDT
+ *  0xc120  0xc15f L2 511  FDT
  *   . ( 2 MB gap )
- *  0xc000  0xc01f L2 511  Xen
+ *  0xc000  0xc0ff L2 511  Xen
  *   . L2 510  Unused
  *  0x320x7f7fff   L2 200-509  Direct map
  *   . L2 199  Unused
@@ -78,7 +78,7 @@
 
 #define GAP_SIZEMB(2)
 
-#define XEN_VIRT_SIZE   MB(2)
+#define XEN_VIRT_SIZE   MB(16)
 
 #define BOOT_FDT_VIRT_START (XEN_VIRT_START + XEN_VIRT_SIZE + GAP_SIZE)
 #define BOOT_FDT_VIRT_SIZE  MB(4)
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 4035cd400a..ef8b35d7c2 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -43,13 +44,21 @@ static inline void *maddr_to_virt(paddr_t ma)
  */
 static inline unsigned long virt_to_maddr(unsigned long va)
 {
+const unsigned long xen_size = (unsigned long)(_end - _start);
+const unsigned long xen_virt_start = _AC(XEN_VIRT_START, UL);
+const unsigned long xen_virt_end = xen_virt_start + xen_size - 1;
+
 if ((va >= DIRECTMAP_VIRT_START) &&
 (va <= DIRECTMAP_VIRT_END))
 return directmapoff_to_maddr(va - directmap_virt_start);
 
-BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2));
-ASSERT((

Re: [PATCH 3/5] x86/hvm: fix handling of accesses to partial r/o MMIO pages

2025-04-14 Thread Roger Pau Monné

On Mon, Apr 14, 2025 at 05:24:32PM +0200, Jan Beulich wrote:
> On 14.04.2025 15:53, Roger Pau Monné wrote:
> > On Mon, Apr 14, 2025 at 08:33:44AM +0200, Jan Beulich wrote:
> >> On 11.04.2025 12:54, Roger Pau Monne wrote:
> >>> @@ -1981,7 +2056,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
> >>> long gla,
> >>>   */
> >>>  if ( (p2mt == p2m_mmio_dm) ||
> >>>   (npfec.write_access &&
> >>> -  (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
> >>> +  (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server) ||
> >>> +   /* MMIO entries can be r/o if the target mfn is in 
> >>> mmio_ro_ranges. */
> >>> +   (p2mt == p2m_mmio_direct))) )
> >>>  {
> >>>  if ( !handle_mmio_with_translation(gla, gfn, npfec) )
> >>>  hvm_inject_hw_exception(X86_EXC_GP, 0);
> >>
> >> Aren't we handing too many things to handle_mmio_with_translation() this
> >> way? At the very least you're losing ...
> >>
> >>> @@ -2033,14 +2110,6 @@ int hvm_hap_nested_page_fault(paddr_t gpa, 
> >>> unsigned long gla,
> >>>  goto out_put_gfn;
> >>>  }
> >>>  
> >>> -if ( (p2mt == p2m_mmio_direct) && npfec.write_access && 
> >>> npfec.present &&
> >>
> >> ... the .present check.
> > 
> > Isn't the p2mt == p2m_mmio_direct check already ensuring the entry is
> > present?  Otherwise it's type would be p2m_invalid or p2m_mmio_dm?
> 
> Yes (to the 1st question), it kind of is.
> 
> > It did seem to me the other checks in this function already assume
> > that by having a valid type the entry is present.
> 
> Except for the code above, where we decided to play safe. AT the very least
> if you drop such a check, please say a justifying word in the description.

I've added:

"As part of the fix r/o MMIO accesses are now handled by
handle_mmio_with_translation(), re-using the same logic that was used
for other read-only types part of p2m_is_discard_write().  The page
present check is dropped as type p2m_mmio_direct must have the
present bit set in the PTE."

Let me know if you think that's enough.

> >> I'm also concerned of e.g. VT-x'es APIC access MFN, which is
> >> p2m_mmio_direct.
> > 
> > But that won't go into hvm_hap_nested_page_fault() when using
> > cpu_has_vmx_virtualize_apic_accesses (and thus having an APIC page
> > mapped as p2m_mmio_direct)?
> > 
> > It would instead be an EXIT_REASON_APIC_ACCESS vmexit which is handled
> > differently?
> 
> All true as long as things work as expected (potentially including the guest
> also behaving as expected). Also this was explicitly only an example I could
> readily think of. I'm simply wary of handle_mmio_with_translation() now
> getting things to handle it's not meant to ever see.

How was access to MMIO r/o regions supposed to be handled before
33c19df9a5a0 (~2015)?  I see that setting r/o MMIO p2m entries was
added way before to p2m_type_to_flags() and ept_p2m_type_to_flags()
(~2010), yet I can't figure out how writes would be handled back then
that didn't result in a p2m fault and crashing of the domain.

I'm happy to look at other ways to handling this, but given there's
current logic for handling accesses to read-only regions in
hvm_hap_nested_page_fault() I think re-using that was the best way to
also handle accesses to MMIO read-only regions.

Arguably it would already be the case that for other reasons Xen would
need to emulate an instruction that accesses a read-only MMIO region?

Thanks, Roger.

Re: [PATCH v1 02/14] xen/riscv: introduce smp_clear_cpu_maps()

2025-04-14 Thread Oleksii Kurochko



On 4/10/25 3:10 PM, Jan Beulich wrote:

On 08.04.2025 17:57, Oleksii Kurochko wrote:

Initialize cpu_{possible, online, present}_map by using smp_clear_cpu_maps().

Drop DEFINE_PER_CPU(unsigned int, cpu_id) from stubs.c as this variable isn't
expected to be used in RISC-V at all.

Move declaration of cpu_{possible,online,present}_map from stubs.c to smpboot.c
as now smpboot.c is now introduced.
Other defintions keep in stubs.c as they are not initialized and not needed, at
the moment.

Signed-off-by: Oleksii Kurochko
---
  xen/arch/riscv/Makefile  |  1 +
  xen/arch/riscv/include/asm/smp.h |  2 ++
  xen/arch/riscv/setup.c   |  2 ++
  xen/arch/riscv/smpboot.c | 15 +++
  xen/arch/riscv/stubs.c   |  6 --
  5 files changed, 20 insertions(+), 6 deletions(-)
  create mode 100644 xen/arch/riscv/smpboot.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 0c6c4a38a3..f551bf32a2 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -10,6 +10,7 @@ obj-y += sbi.o
  obj-y += setup.o
  obj-y += shutdown.o
  obj-y += smp.o
+obj-y += smpboot.o
  obj-y += stubs.o
  obj-y += time.o
  obj-y += traps.o
diff --git a/xen/arch/riscv/include/asm/smp.h b/xen/arch/riscv/include/asm/smp.h
index 5e170b57b3..188c033718 100644
--- a/xen/arch/riscv/include/asm/smp.h
+++ b/xen/arch/riscv/include/asm/smp.h
@@ -26,6 +26,8 @@ static inline void set_cpuid_to_hartid(unsigned long cpuid,
  
  void setup_tp(unsigned int cpuid);
  
+void smp_clear_cpu_maps(void);

+
  #endif
  
  /*

diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 4e416f6e44..7f68f3f5b7 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -72,6 +72,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
  
  remove_identity_mapping();
  
+smp_clear_cpu_maps();

+
  set_processor_id(0);
  
  set_cpuid_to_hartid(0, bootcpu_id);

diff --git a/xen/arch/riscv/smpboot.c b/xen/arch/riscv/smpboot.c
new file mode 100644
index 00..0f4dcc28e1
--- /dev/null
+++ b/xen/arch/riscv/smpboot.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+
+cpumask_t cpu_online_map;
+cpumask_t cpu_present_map;
+cpumask_t cpu_possible_map;

__read_mostly for all of them, perhaps (if CPU hotplug isn't expected to
be supported) even __ro_after_init for the latter two?


We have been living without CPU hotplug support for a long time in the 
downstream
branch, but I can't say whether it is expected to be supported in the future or 
not.
To ensure we can add such an option later without changing the attributes of
cpu_online_map variable, I prefer to use|__read_mostly| here and 
__ro_after_init for
cpu_possible_map.



As to cpu_possible_map - do you predict that you'll actually use it? Arm
does (and instead has only a fake cpu_present_map), but on x86 we get away
without.


I checked how it is used now in downstream latest branch and it isn't really 
used
only during initialization smp_clear_cpu_maps() and smp_prepare_cpus() so we can
skip it for RISC-V too.




+void __init smp_clear_cpu_maps(void)
+{
+cpumask_clear(&cpu_possible_map);
+cpumask_clear(&cpu_online_map);

What's the point of these? All three maps start out fully zeroed.


It could be really dropped. I saw your patch for Arm, I'll align the current
patch with that changes.





+cpumask_set_cpu(0, &cpu_possible_map);
+cpumask_set_cpu(0, &cpu_online_map);

These are contradicting the name of the function. The somewhat equivalent
function we have on x86 is smp_prepare_boot_cpu().


+cpumask_copy(&cpu_present_map, &cpu_possible_map);

Another cpumask_set_cpu() is probably cheaper here then.


What do you mean by cheaper here?

~ Oleksii

[PATCH v2 8/8] xen/common: dom0less: introduce common dom0less-build.c

2025-04-14 Thread Oleksii Kurochko

Part of Arm's dom0less-build.c could be common between architectures which are
using device tree files to create guest domains. Thereby move some parts of
Arm's dom0less-build.c to common code with minor changes.

As a part of theses changes the following changes are introduced:
- Introduce make_arch_nodes() to cover arch-specific nodes. For example, in
  case of Arm, it is PSCI and vpl011 nodes.
- Introduce set_domain_type() to abstract a way how setting of domain type
  happens. For example, RISC-V won't have this member of arch_domain structure
  as vCPUs will always have the same bitness as hypervisor. In case of Arm, it
  is possible that Arm64 could create 32-bit and 64-bit domains.
- Introduce init_vuart() to cover details of virtual uart initialization.
- Introduce init_intc_phandle() to cover some details of interrupt controller
  phandle initialization. As an example, RISC-V could have different name for
  interrupt controller node ( APLIC, PLIC, IMSIC, etc ) but the code in
  domain_handle_dtb_bootmodule() could handle only one interrupt controller
  node name.
- s/make_gic_domU_node/make_intc_domU_node as GIC is Arm specific naming and
  add prototype of make_intc_domU_node() to dom0less-build.h

The following functions are moved to xen/common/device-tree:
- Functions which are moved as is:
  - domain_p2m_pages().
  - handle_passthrough_prop().
  - handle_prop_pfdt().
  - scan_pfdt_node().
  - check_partial_fdt().
- Functions which are moved with some minor changes:
  - alloc_xenstore_evtchn():
- ifdef-ing by CONFIG_HVM accesses to hvm.params.
  - prepare_dtb_domU():
- ifdef-ing access to gnttab_{start,size} by CONFIG_GRANT_TABLE.
- s/make_gic_domU_node/make_intc_domU_node.
- Add call of make_arch_nodes().
- domain_handle_dtb_bootmodule():
  - hide details of interrupt controller phandle initialization by calling
init_intc_phandle().
  - Update the comment above init_intc_phandle(): s/gic/interrupt controller.
- construct_domU():
  - ifdef-ing by CONFIG_HVM accesses to hvm.params.
  - Call init_vuart() to hide Arm's vpl011_init() details there.
  - Add call of set_domain_type() instead of setting kinfo->arch.type 
explicitly.

Some parts of dom0less-build.c are wraped by #ifdef CONFIG_STATIC_{SHMEM,MEMORY}
as not all archs support these configs.

Signed-off-by: Oleksii Kurochko 
---
Change in v2:
 - Wrap by #ifdef CONFIG_STATIC_* inclusions of  and
   . Wrap also the code which uses something from the
   mentioned headers.
 - Add handling of legacy case in construct_domU().
 - Use xen/fdt-kernel.h and xen/fdt-domain-build.h instead of asm/*.
 - Update the commit message.
---
 xen/arch/arm/dom0less-build.c| 721 ++
 xen/common/device-tree/dom0less-build.c  | 730 +++
 xen/include/asm-generic/dom0less-build.h |  14 +
 3 files changed, 779 insertions(+), 686 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index ca78cff655..463c38ae6c 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -147,7 +147,7 @@ static int __init make_gicv3_domU_node(struct kernel_info 
*kinfo)
 }
 #endif
 
-static int __init make_gic_domU_node(struct kernel_info *kinfo)
+int __init make_intc_domU_node(struct kernel_info *kinfo)
 {
 switch ( kinfo->d->arch.vgic.version )
 {
@@ -213,729 +213,62 @@ static int __init make_vpl011_uart_node(struct 
kernel_info *kinfo)
 }
 #endif
 
-/*
- * Scan device tree properties for passthrough specific information.
- * Returns < 0 on error
- * 0 on success
- */
-static int __init handle_passthrough_prop(struct kernel_info *kinfo,
-  const struct fdt_property *xen_reg,
-  const struct fdt_property *xen_path,
-  bool xen_force,
-  uint32_t address_cells,
-  uint32_t size_cells)
-{
-const __be32 *cell;
-unsigned int i, len;
-struct dt_device_node *node;
-int res;
-paddr_t mstart, size, gstart;
-
-/* xen,reg specifies where to map the MMIO region */
-cell = (const __be32 *)xen_reg->data;
-len = fdt32_to_cpu(xen_reg->len) / ((address_cells * 2 + size_cells) *
-sizeof(uint32_t));
-
-for ( i = 0; i < len; i++ )
-{
-device_tree_get_reg(&cell, address_cells, size_cells,
-&mstart, &size);
-gstart = dt_next_cell(address_cells, &cell);
-
-if ( gstart & ~PAGE_MASK || mstart & ~PAGE_MASK || size & ~PAGE_MASK )
-{
-printk(XENLOG_ERR
-   "DomU passthrough config has not page aligned 
addresses/sizes\n");
-return -EINVAL;
-}
-
-res = iomem_permit_access(kinfo->d, paddr_to_pfn(mstart),
-  paddr_to_pfn(PAGE_ALIGN(mstart + size - 1)));
-

[PATCH v2 4/8] arm/static-shmem.h: drop inclusion of asm/setup.h

2025-04-14 Thread Oleksii Kurochko

Nothing is dependent from asm/setup.h in asm/static-shmem.h so inclusion of
asm/setup.h is droped.

After this drop the following compilation error related to impicit declaration
of the following functions device_tree_get_reg and map_device_irqs_to_domain,
device_tree_get_u32 occur during compilation of dom0less-build.c ( as they are
declared in asm/setup.h ).

Add inclusion of  in dt-overlay.c as it is using handle_device()
declared in .

Signed-off-by: Oleksii Kurochko 
---
Changes in V2:
 - Nothing changed. Only rebase.
---
 xen/arch/arm/dom0less-build.c   | 1 +
 xen/common/device-tree/dt-overlay.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 5810083951..122739061c 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/xen/common/device-tree/dt-overlay.c 
b/xen/common/device-tree/dt-overlay.c
index 81107cb48d..d184186c01 100644
--- a/xen/common/device-tree/dt-overlay.c
+++ b/xen/common/device-tree/dt-overlay.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+#include 
+
 #define DT_OVERLAY_MAX_SIZE KB(500)
 
 static LIST_HEAD(overlay_tracker);
-- 
2.49.0

[PATCH v3 2/6] CI: avoid repacking initrd as part of the test job

2025-04-14 Thread Andrew Cooper

From: Marek Marczykowski-Górecki 

Use the new test-artifacts which provide rootfs.cpio.gz rather than
initrd.tar.gz.  rootfs.cpio.gz also has all the necessary top-level
directories, and includes the rc_verbose setting, so these modifications can
be dropped.

Having that, do not repack the whole initrd, but only pack modified
files and rely on Linux handling of concatenated archives.
This allows packing just test-related files (which includes the whole
toolstack), instead of the whole initrd.

For xilinx-smoke-dom0-x86_64.sh, this involves instructing grub not to unzip
the archive, as doing so corrupts it.

Signed-off-by: Marek Marczykowski-Górecki 
Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Michal Orzel 
CC: Doug Goldstein 
CC: Marek Marczykowski-Górecki 

v3:
 * Tested (bugfixed) on xilinx-* runners
 * Rearrange logic so the order of concantination is clearer (relevant for
   subsequent patches)

https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/1765676583
---
 automation/gitlab-ci/test.yaml|  8 +++--
 automation/scripts/qemu-alpine-x86_64.sh  | 16 --
 automation/scripts/qemu-smoke-dom0-arm64.sh   | 14 
 .../scripts/qemu-smoke-dom0less-arm64.sh  | 15 -
 automation/scripts/qubes-x86-64.sh| 32 +++
 .../scripts/xilinx-smoke-dom0-x86_64.sh   | 27 
 .../scripts/xilinx-smoke-dom0less-arm64.sh| 30 +++--
 7 files changed, 61 insertions(+), 81 deletions(-)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 59a2de28c864..51229cbe561d 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -11,7 +11,9 @@
   - project: xen-project/hardware/test-artifacts
 job: linux-6.6.86-arm64
 ref: master
-  - alpine-3.18-arm64-rootfs-export
+  - project: xen-project/hardware/test-artifacts
+job: alpine-3.18-arm64-rootfs
+ref: master
   - qemu-system-aarch64-6.0.0-arm64-export
 
 .arm32-test-needs: &arm32-test-needs
@@ -22,7 +24,7 @@
 job: linux-6.6.56-x86_64
 ref: master
   - project: xen-project/hardware/test-artifacts
-job: x86_64-rootfs-alpine-3.18
+job: alpine-3.18-x86_64-rootfs
 ref: master
 
 .qemu-arm64:
@@ -243,7 +245,7 @@ xilinx-smoke-dom0-x86_64-gcc-debug-argo:
   job: x86_64-kernel-linux-6.6.56
   ref: master
 - project: xen-project/hardware/test-artifacts
-  job: x86_64-rootfs-alpine-3.18
+  job: alpine-3.18-x86_64-rootfs
   ref: master
 - project: xen-project/hardware/test-artifacts
   job: x86_64-argo-linux-6.6.56
diff --git a/automation/scripts/qemu-alpine-x86_64.sh 
b/automation/scripts/qemu-alpine-x86_64.sh
index 569bd766d31e..c7dd12197862 100755
--- a/automation/scripts/qemu-alpine-x86_64.sh
+++ b/automation/scripts/qemu-alpine-x86_64.sh
@@ -28,16 +28,14 @@ cd initrd
 find . | cpio -H newc -o | gzip > ../domU-rootfs.cpio.gz
 cd ..
 
-# initrd.tar.gz is Dom0 rootfs
+# Dom0 rootfs
+cp rootfs.cpio.gz dom0-rootfs.cpio.gz
+
+# test-local configuration
 mkdir -p rootfs
 cd rootfs
-tar xvzf ../initrd.tar.gz
-mkdir proc
-mkdir run
-mkdir srv
-mkdir sys
-rm var/run
 cp -ar ../dist/install/* .
+mkdir -p root etc/local.d
 mv ../domU-rootfs.cpio.gz ./root
 cp ../bzImage ./root
 echo "name=\"domU\"
@@ -60,9 +58,7 @@ xl -vvv create -c /root/domU.cfg
 
 " > etc/local.d/xen.start
 chmod +x etc/local.d/xen.start
-echo "rc_verbose=yes" >> etc/rc.conf
-# rebuild Dom0 rootfs
-find . | cpio -H newc -o | gzip > ../dom0-rootfs.cpio.gz
+find . | cpio -H newc -o | gzip >> ../dom0-rootfs.cpio.gz
 cd ../..
 
 cat >> binaries/pxelinux.0 << EOF
diff --git a/automation/scripts/qemu-smoke-dom0-arm64.sh 
b/automation/scripts/qemu-smoke-dom0-arm64.sh
index e8e49ded245a..c0cf61ff8f7b 100755
--- a/automation/scripts/qemu-smoke-dom0-arm64.sh
+++ b/automation/scripts/qemu-smoke-dom0-arm64.sh
@@ -27,15 +27,14 @@ cd initrd
 find . | cpio -H newc -o | gzip > ../domU-rootfs.cpio.gz
 cd ..
 
+# Dom0 rootfs
+cp rootfs.cpio.gz dom0-rootfs.cpio.gz
+
+# test-local configuration
 mkdir -p rootfs
 cd rootfs
-tar xvzf ../initrd.tar.gz
-mkdir proc
-mkdir run
-mkdir srv
-mkdir sys
-rm var/run
 cp -ar ../dist/install/* .
+mkdir -p etc/local.d root
 mv ../domU-rootfs.cpio.gz ./root
 cp ../Image ./root
 echo "name=\"domU\"
@@ -56,8 +55,7 @@ xl -vvv create -c /root/domU.cfg
 
 " > etc/local.d/xen.start
 chmod +x etc/local.d/xen.start
-echo "rc_verbose=yes" >> etc/rc.conf
-find . | cpio -H newc -o | gzip > ../dom0-rootfs.cpio.gz
+find . | cpio -H newc -o | gzip >> ../dom0-rootfs.cpio.gz
 cd ../..
 
 # XXX QEMU looks for "efi-virtio.rom" even if it is unneeded
diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh 
b/automation/scripts/qemu-smoke-dom0less-arm64.sh
index f72d20936181..8e939f0b7214 100755
--- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
+++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
@@ -114,16 +114,14 @@ cd initrd
 find . | cpio --create --forma

Re: [PATCH v3 4/6] CI: Switch to new argo artefact

2025-04-14 Thread Anthony PERARD

On Mon, Apr 14, 2025 at 12:09:01PM +0100, Andrew Cooper wrote:
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index 51229cbe561d..d46da1c43d05 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -242,7 +242,7 @@ xilinx-smoke-dom0-x86_64-gcc-debug-argo:
>needs:
>  - alpine-3.18-gcc-debug
>  - project: xen-project/hardware/test-artifacts
> -  job: x86_64-kernel-linux-6.6.56
> +  job: linux-6.6.56-x86_64
>ref: master
>  - project: xen-project/hardware/test-artifacts
>job: alpine-3.18-x86_64-rootfs


Don't you need to remove the dependency on "x86_64-argo-linux-6.6.56"
which is just out of context, as I think this is now part of
"linux-6.6.56-x86_64" job.

Besides that:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 08/16] x86/hyperlaunch: Add helpers to locate multiboot modules

2025-04-14 Thread Jan Beulich

On 14.04.2025 15:37, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 11:42 AM BST, Jan Beulich wrote:
>> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>>> +/*
>>> + * Locate a multiboot module given its node offset in the FDT.
>>> + *
>>> + * The module location may be given via either FDT property:
>>> + * * reg = 
>>> + * * Mutates `bi` to append the module.
>>> + * * module-index = 
>>> + * * Leaves `bi` unchanged.
>>> + *
>>> + * @param fdt   Pointer to the full FDT.
>>> + * @param node  Offset for the module node.
>>> + * @param address_cells Number of 4-octet cells that make up an "address".
>>> + * @param size_cellsNumber of 4-octet cells that make up a "size".
>>> + * @param bi[inout] Xen's representation of the boot parameters.
>>> + * @return  -EINVAL on malformed nodes, otherwise
>>> + *  index inside `bi->mods`
>>> + */
>>> +int __init fdt_read_multiboot_module(const void *fdt, int node,
>>> + int address_cells, int size_cells,
>>> + struct boot_info *bi)
>>
>> Functions without callers and non-static ones without declarations are
>> disliked by Misra.
> 
> Can't do much about it if I want them to stand alone in a single patch.
> Otherwise the following ones become quite unwieldy to look at. All I can
> say is that this function becomes static and with a caller on the next
> patch.

Which means you need to touch this again anyway. Perhaps we need a Misra
deviation for __maybe_unused functions / data, in which case you could
use that here and strip it along with making the function static. Cc-ing
Bugseng folks.

>>> +/* Otherwise location given as a `reg` property. */
>>> +prop = fdt_get_property(fdt, node, "reg", NULL);
>>> +
>>> +if ( !prop )
>>> +{
>>> +printk("  No location for multiboot,module\n");
>>> +return -EINVAL;
>>> +}
>>> +if ( fdt_get_property(fdt, node, "module-index", NULL) )
>>> +{
>>> +printk("  Location of multiboot,module defined multiple times\n");
>>> +return -EINVAL;
>>> +}
>>> +
>>> +ret = read_fdt_prop_as_reg(prop, address_cells, size_cells, &addr, 
>>> &size);
>>> +
>>> +if ( ret < 0 )
>>> +{
>>> +printk("  Failed reading reg for multiboot,module\n");
>>> +return -EINVAL;
>>> +}
>>> +
>>> +idx = bi->nr_modules + 1;
>>
>> This at least looks like an off-by-one. If the addition of 1 is really
>> intended, I think it needs commenting on.
> 
> Seems to be, yes. The underlying array is a bit bizarre. It's sizes as
> MAX_NR_BOOTMODS + 1, with the first one being the DTB itself. I guess
> the intent was to take it into account, but bi->nr_modules is
> initialised to the number of multiboot modules, so it SHOULD be already
> taking it into account.
> 
> Also, the logic for bounds checking seems... off (because of the + 1 I
> mentioned before). Or at least confusing, so I've moved to using
> ARRAY_SIZE(bi->mods) rather than explicitly comparing against
> MAX_NR_BOOTMODS.
> 
> The array is MAX_NR_BOOTMODS + 1 in length, so it's just more cognitive
> load than I'm comfortable with.

If I'm not mistaken the +1 is inherited from the modules array we had in
the past, where we wanted 1 extra slot for Xen itself. Hence before you
move to using ARRAY_SIZE() everywhere it needs to really be clear what
the +1 here is used for.

>>> --- a/xen/include/xen/libfdt/libfdt-xen.h
>>> +++ b/xen/include/xen/libfdt/libfdt-xen.h
>>> @@ -13,6 +13,63 @@kkk
>>>  
>>>  #include 
>>>  
>>> +static inline int __init fdt_cell_as_u32(const fdt32_t *cell)
>>
>> Why plain int here, but ...
>>
>>> +{
>>> +return fdt32_to_cpu(*cell);
>>> +}
>>> +
>>> +static inline uint64_t  __init fdt_cell_as_u64(const fdt32_t *cell)
>>
>> ... a fixed-width and unsigned type here? Question is whether the former
>> helper is really warranted.
>>
>> Also nit: Stray double blank.
>>
>>> +{
>>> +return ((uint64_t)fdt32_to_cpu(cell[0]) << 32) | fdt32_to_cpu(cell[1]);
>>
>> That is - uniformly big endian?
> 
> These helpers are disappearing, so it doesn't matter. This is basically
> an open coded:
> 
>   fdt64_to_cpu(*(const fdt64_t *)fdt32)
> 
> And, yes. DTBs are standardised as having big-endian properties, for
> better or worse :/
> 
>>
>>> +}
>>
>> Marking such relatively generic inline functions __init is also somewhat
>> risky. 
> 
> They were originally in domain-builder/fdt.c and moved here as a result
> of a request to have them on libfdt. libfdt proved to be somewhat
> annoying because it would be hard to distinguish accessors for the
> flattened and the unflattened tree.
> 
> I'd personally have them in domain-builder instead, where they are used.
> Should they be needed somewhere else, we can always fator them out
> somewhere else.
> 
> Thoughts?

As long as they're needed only by domain-builder, it's probably fine to have
them just there.

Jan

Re: [PATCH v3 7/7] arm/mpu: Implement setup_mpu for MPU system

2025-04-14 Thread Luca Fancellu

HI Julien,

> On 14 Apr 2025, at 13:12, Julien Grall  wrote:
> 
> Hi Luca,
> 
> On 11/04/2025 23:56, Luca Fancellu wrote:
>> Implement the function setup_mpu that will logically track the MPU
>> regions defined by hardware registers, start introducing data
>> structures and functions to track the status from the C world.
>> The xen_mpumap_mask bitmap is used to track which MPU region are
>> enabled at runtime.
>> This function is called from setup_mm() which full implementation
>> will be provided in a later stage.
>> Signed-off-by: Luca Fancellu 
>> ---
>> v3 changes:
>>  - Moved PRENR_MASK define to common.
>> ---
>> ---
>>  xen/arch/arm/include/asm/mpu.h |  2 ++
>>  xen/arch/arm/mpu/mm.c  | 49 +-
>>  2 files changed, 50 insertions(+), 1 deletion(-)
>> diff --git a/xen/arch/arm/include/asm/mpu.h b/xen/arch/arm/include/asm/mpu.h
>> index eba5086cde97..77d0566f9780 100644
>> --- a/xen/arch/arm/include/asm/mpu.h
>> +++ b/xen/arch/arm/include/asm/mpu.h
>> @@ -20,6 +20,8 @@
>>  #define NUM_MPU_REGIONS_MASK(NUM_MPU_REGIONS - 1)
>>  #define MAX_MPU_REGIONS NUM_MPU_REGIONS_MASK
>>  +#define PRENR_MASK  GENMASK(31, 0)
>> +
>>  /* Access permission attributes. */
>>  /* Read/Write at EL2, No Access at EL1/EL0. */
>>  #define AP_RW_EL2 0x0
>> diff --git a/xen/arch/arm/mpu/mm.c b/xen/arch/arm/mpu/mm.c
>> index 635d1f5a2ba0..e0a40489a7fc 100644
>> --- a/xen/arch/arm/mpu/mm.c
>> +++ b/xen/arch/arm/mpu/mm.c
>> @@ -14,6 +14,17 @@
>>struct page_info *frame_table;
>>  +/* Maximum number of supported MPU memory regions by the EL2 MPU. */
> > +uint8_t __ro_after_init max_xen_mpumap;
> 
> Are this variable and ...
> 
>> +
>> +/*
>> + * Bitmap xen_mpumap_mask is to record the usage of EL2 MPU memory regions.
>> + * Bit 0 represents MPU memory region 0, bit 1 represents MPU memory
>> + * region 1, ..., and so on.
>> + * If a MPU memory region gets enabled, set the according bit to 1.
>> + */
>> +DECLARE_BITMAP(xen_mpumap_mask, MAX_MPU_REGIONS);
> 
> ... this one meant to be global? If yes, then they need to have a declaration 
> in the header. If not, then you want to add 'static'.

yes they are meant to be global, I’ll add a declaration in the header.

> 
> > +
>>  /* EL2 Xen MPU memory region mapping table. */
>>  pr_t xen_mpumap[MAX_MPU_REGIONS];
>>  @@ -222,9 +233,45 @@ pr_t pr_of_xenaddr(paddr_t base, paddr_t limit, 
>> unsigned attr)
>>  return region;
>>  }
>>  +/*
>> + * The code in this function needs to track the regions programmed in
>> + * arm64/mpu/head.S
>> + */
>> +static void __init setup_mpu(void)
>> +{
>> +register_t prenr;
>> +unsigned int i = 0;
>> +
>> +/*
>> + * MPUIR_EL2.Region[0:7] identifies the number of regions supported by
>> + * the EL2 MPU.
>> + */
>> +max_xen_mpumap = (uint8_t)(READ_SYSREG(MPUIR_EL2) & 
>> NUM_MPU_REGIONS_MASK);
>> +
>> +/* PRENR_EL2 has the N bit set if the N region is enabled, N < 32 */
>> +prenr = (READ_SYSREG(PRENR_EL2) & PRENR_MASK);
>> +
>> +/*
>> + * Set the bitfield for regions enabled in assembly boot-time.
>> + * This code works under the assumption that the code in head.S has
>> + * allocated and enabled regions below 32 (N < 32).
>> + 
> This is a bit fragile. I think it would be better if the bitmap is set by 
> head.S as we add the regions. Same for ...

So, I was trying to avoid that because in that case we need to place xen_mpumap 
out of the BSS and start
manipulating the bitmap from asm, instead I was hoping to use the C code, I 
understand that if someone
wants to have more than 31 region as boot region this might break, but it’s 
also a bit unlikely?

So I was balancing the pros to manipulate everything from the C world against 
the cons (boot region > 31).

Is it still your preferred way to handle everything from asm?

Cheers,
Luca

Re: [PATCH v3 10/16] x86/hyperlaunch: obtain cmdline from device tree

2025-04-14 Thread Jan Beulich

On 14.04.2025 16:23, Alejandro Vallejo wrote:
> On Thu Apr 10, 2025 at 12:12 PM BST, Jan Beulich wrote:
>> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>>> --- a/xen/arch/x86/domain-builder/fdt.c
>>> +++ b/xen/arch/x86/domain-builder/fdt.c
>>> @@ -189,6 +189,12 @@ static int __init process_domain_node(
>>>  printk("  kernel: boot module %d\n", idx);
>>>  bi->mods[idx].type = BOOTMOD_KERNEL;
>>>  bd->kernel = &bi->mods[idx];
>>> +
>>> +/* If bootloader didn't set cmdline, see if FDT provides one. 
>>> */
>>> +if ( bd->kernel->cmdline_pa &&
>>> + !((char *)__va(bd->kernel->cmdline_pa))[0] )
>>> +bd->kernel->fdt_cmdline = fdt_get_prop_offset(
>>> +fdt, node, "bootargs", &bd->kernel->cmdline_pa);
>>
>> Somewhat orthogonal question: Should there perhaps be a way for the boot 
>> loader
>> provided cmdline to go at the tail of the DT provided one?
> 
> That would preclude the bootloader fully overriding what's on the DT.
> One can always just copy the cmdline in the DT to the bootloader and
> adjust whatever is necessary there for testing. Adding append behaviour
> sounds more like a hindrance rather than helpful. To me at least.

Well. This is why I have been pushing for all options to also have a
"negative" form. This way you can override whatever specifically you
need to override, without re-typing the entire (perhaps long) cmdline
from DT.

Also, I didn't mean that to necessarily be the one-and-only behavior.

Jan

Re: [PATCH v3 1/7] arm/mpu: Introduce MPU memory region map structure

2025-04-14 Thread Luca Fancellu

Hi Michal,

> On 14 Apr 2025, at 11:17, Orzel, Michal  wrote:
> 
> 
> 
> On 11/04/2025 16:56, Luca Fancellu wrote:
>> From: Penny Zheng 
>> 
>> Introduce pr_t typedef which is a structure having the prbar
>> and prlar members, each being structured as the registers of
>> the aarch64 armv8-r architecture.
>> 
>> Introduce the array 'xen_mpumap' that will store a view of
>> the content of the MPU regions.
>> 
>> Introduce MAX_MPU_REGIONS macro that uses the value of
>> NUM_MPU_REGIONS_MASK just for clarity, because using the
>> latter as number of elements of the xen_mpumap array might
>> be misleading.
> What should be the size of this array? I thought NUM_MPU_REGIONS indicates how
> many regions there can be (i.e. 256) and this should be the size. Yet you use
> MASK for size which is odd.

So the maximum number of regions for aarch64 armv8-r are 255, MPUIR_EL2.REGION 
is an
8 bit field advertising the number of region supported.

Is it better if I use just the below?

#define MAX_MPU_REGIONS 255

> 
>> 
>> Signed-off-by: Penny Zheng 
>> Signed-off-by: Wei Chen 
>> Signed-off-by: Luca Fancellu 
>> ---
>> xen/arch/arm/include/asm/arm64/mpu.h | 44 
>> xen/arch/arm/include/asm/mpu.h   |  5 
>> xen/arch/arm/mpu/mm.c|  4 +++
>> 3 files changed, 53 insertions(+)
>> create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
>> 
>> diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
>> b/xen/arch/arm/include/asm/arm64/mpu.h
>> new file mode 100644
>> index ..4d2bd7d7877f
>> --- /dev/null
>> +++ b/xen/arch/arm/include/asm/arm64/mpu.h
>> @@ -0,0 +1,44 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * mpu.h: Arm Memory Protection Unit definitions for aarch64.
> NIT: Do we really see the benefit in having such generic comments? What if you
> add a prototype of some function here. Will it fit into a definition scope?

I can remove the comment, but I would say that if I put some function prototype 
here
it should be related to arm64, being this file under arm64.

> 
>> + */
>> +
>> +#ifndef __ARM_ARM64_MPU_H__
>> +#define __ARM_ARM64_MPU_H__
>> +
>> +#ifndef __ASSEMBLY__
>> +
>> +/* Protection Region Base Address Register */
>> +typedef union {
>> +struct __packed {
>> +unsigned long xn:2;   /* Execute-Never */
>> +unsigned long ap:2;   /* Acess Permission */
> s/Acess/Access/
> 
>> +unsigned long sh:2;   /* Sharebility */
> s/Sharebility/Shareability/
> 
>> +unsigned long base:46;/* Base Address */
>> +unsigned long pad:12;
> If you describe the register 1:1, why "pad" and not "res" or "res0"?
> 
>> +} reg;
>> +uint64_t bits;
>> +} prbar_t;
>> +
>> +/* Protection Region Limit Address Register */
>> +typedef union {
>> +struct __packed {
>> +unsigned long en:1; /* Region enable */
>> +unsigned long ai:3; /* Memory Attribute Index */
>> +unsigned long ns:1; /* Not-Secure */
>> +unsigned long res:1;/* Reserved 0 by hardware */
> res0 /* RES0 */
> 
>> +unsigned long limit:46; /* Limit Address */
>> +unsigned long pad:12;
> res1 /* RES0 */
> 
>> +} reg;
>> +uint64_t bits;
>> +} prlar_t;
>> +
>> +/* MPU Protection Region */
>> +typedef struct {
>> +prbar_t prbar;
>> +prlar_t prlar;
>> +} pr_t;
>> +
>> +#endif /* __ASSEMBLY__ */
>> +
>> +#endif /* __ARM_ARM64_MPU_H__ */
>> \ No newline at end of file
> Please add a new line at the end
> 
> Also, EMACS comment is missing.

Thanks I will fix all these findings

Cheers,
Luca

Re: [PATCH v3 1/7] arm/mpu: Introduce MPU memory region map structure

2025-04-14 Thread Luca Fancellu

Hi Michal,

> On 14 Apr 2025, at 16:01, Orzel, Michal  wrote:
> 
> 
> 
> On 14/04/2025 16:50, Luca Fancellu wrote:
>> Hi Michal,
>> 
>>> On 14 Apr 2025, at 11:17, Orzel, Michal  wrote:
>>> 
>>> 
>>> 
>>> On 11/04/2025 16:56, Luca Fancellu wrote:
 From: Penny Zheng 
 
 Introduce pr_t typedef which is a structure having the prbar
 and prlar members, each being structured as the registers of
 the aarch64 armv8-r architecture.
 
 Introduce the array 'xen_mpumap' that will store a view of
 the content of the MPU regions.
 
 Introduce MAX_MPU_REGIONS macro that uses the value of
 NUM_MPU_REGIONS_MASK just for clarity, because using the
 latter as number of elements of the xen_mpumap array might
 be misleading.
>>> What should be the size of this array? I thought NUM_MPU_REGIONS indicates 
>>> how
>>> many regions there can be (i.e. 256) and this should be the size. Yet you 
>>> use
>>> MASK for size which is odd.
>> 
>> So the maximum number of regions for aarch64 armv8-r are 255, 
>> MPUIR_EL2.REGION is an
>> 8 bit field advertising the number of region supported.
> So there can be max 255 regions. Ok.
> 
>> 
>> Is it better if I use just the below?
>> 
>> #define MAX_MPU_REGIONS 255
> If there are 255 regions, what NUM_MPU_REGIONS macro is for which stores 256?
> These two macros confuse me. Or is it that by your macro you want to denote 
> the
> max region number? In that case, the macro should be named MAX_MPU_REGION_NR 
> or
> alike.

I know, NUM_MPU_REGIONS should have a different name as it’s a bit misleading, 
ok
I’ll name the macro I use here as MAX_MPU_REGION_NR.

Cheers,
Luca

[PATCH v2.1 4/7] Shrink the rootfs substantially

2025-04-14 Thread Andrew Cooper

bash, busybox, musl and zlib are all in the base container.

python3 and ncurses are in principle used by bits of Xen, but not in anything
we test in CI.  argp-standlone, curl, dbus, libfdt, libgcc and sudo aren't
used at all (for x86 at least).

libbz2 and libuuid were pulled in transitively before, and need to be included
explicitly now.

Use apk --no-cache to avoid keeping a ~2M package index on disk.  Use apk
upgrade in case there are changes to the base container.

Remove the modules scan on boot.  We don't have or build any (except argo, and
that's handled specially).  This removes a chunk of warnings on boot.

This shrinks the rootfs from ~30M down to ~8M.

Factor out some x86-isms in preparation for ARM64 support.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Michal Orzel 
CC: Doug Goldstein 
CC: Marek Marczykowski-Górecki 

v2.1:
 * Extend commit message
 * Use apk upgrade

https://gitlab.com/xen-project/hardware/test-artifacts/-/jobs/9713228239
https://gitlab.com/xen-project/hardware/test-artifacts/-/jobs/9713228242
---
 scripts/alpine-rootfs.sh | 60 +++-
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/scripts/alpine-rootfs.sh b/scripts/alpine-rootfs.sh
index 75e2f8648ce5..b01de9709d02 100755
--- a/scripts/alpine-rootfs.sh
+++ b/scripts/alpine-rootfs.sh
@@ -4,33 +4,42 @@ set -eu
 
 WORKDIR="${PWD}"
 COPYDIR="${WORKDIR}/binaries"
+UNAME=$(uname -m)
 
-apk update
+apk --no-cache upgrade
 
-# xen runtime deps
-apk add musl
-apk add libgcc
-apk add openrc
-apk add busybox
-apk add sudo
-apk add dbus
-apk add bash
-apk add python3
-apk add zlib
-apk add lzo
-apk add ncurses
-apk add yajl
-apk add libaio
-apk add xz
-apk add util-linux
-apk add argp-standalone
-apk add libfdt
-apk add glib
-apk add pixman
-apk add curl
-apk add udev
-apk add pciutils
-apk add libelf
+PKGS=(
+# System
+openrc
+udev
+util-linux
+
+# Xen toolstack runtime deps
+libbz2
+libuuid
+lzo
+xz
+yajl
+
+# QEMU
+glib
+libaio
+pixman
+)
+
+case $UNAME in
+x86_64)
+PKGS+=(
+# System
+pciutils
+
+# QEMU
+libelf
+)
+;;
+esac
+
+apk add --no-cache "${PKGS[@]}"
 
 # Xen
 cd /
@@ -45,7 +54,6 @@ rc-update add dmesg sysinit
 rc-update add hostname boot
 rc-update add hwclock boot
 rc-update add hwdrivers sysinit
-rc-update add modules boot
 rc-update add killprocs shutdown
 rc-update add mount-ro shutdown
 rc-update add savecache shutdown
-- 
2.39.5

Re: [PATCH v3 6/6] CI: Include microcode for x86 hardware jobs

2025-04-14 Thread Marek Marczykowski-Górecki

On Mon, Apr 14, 2025 at 06:47:07PM +0100, Andrew Cooper wrote:
> On 14/04/2025 6:45 pm, Anthony PERARD wrote:
> > On Mon, Apr 14, 2025 at 12:09:03PM +0100, Andrew Cooper wrote:
> >> diff --git a/automation/gitlab-ci/build.yaml 
> >> b/automation/gitlab-ci/build.yaml
> >> index 1b82b359d01f..ac5367874526 100644
> >> --- a/automation/gitlab-ci/build.yaml
> >> +++ b/automation/gitlab-ci/build.yaml
> >> @@ -306,6 +306,7 @@ alpine-3.18-gcc-debug:
> >>CONFIG_ARGO=y
> >>CONFIG_UBSAN=y
> >>CONFIG_UBSAN_FATAL=y
> >> +  CONFIG_UCODE_SCAN_DEFAULT=y
> > Is there a change
> 
> DYM "chance" ?
> 
> >  that this patch series gets backported? Because that
> > new Kconfig option won't exist.
> 
> Yes, I do intend to backport this whole series in due course, and yes,
> I'm aware.

A more backport-friendly way would be add ucode=scan to xen cmdline.

> > Othewise, patch looks fine:
> > Reviewed-by: Anthony PERARD 
> 
> Thanks.
> 
> ~Andrew

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH v3 6/6] CI: Include microcode for x86 hardware jobs

2025-04-14 Thread Anthony PERARD

On Mon, Apr 14, 2025 at 12:09:03PM +0100, Andrew Cooper wrote:
> diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
> index 1b82b359d01f..ac5367874526 100644
> --- a/automation/gitlab-ci/build.yaml
> +++ b/automation/gitlab-ci/build.yaml
> @@ -306,6 +306,7 @@ alpine-3.18-gcc-debug:
>CONFIG_ARGO=y
>CONFIG_UBSAN=y
>CONFIG_UBSAN_FATAL=y
> +  CONFIG_UCODE_SCAN_DEFAULT=y

Is there a change that this patch series gets backported? Because that
new Kconfig option won't exist.

Othewise, patch looks fine:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

linux-6.15-rc2/drivers/xen/balloon.c:346: Possible int/long mixup

2025-04-14 Thread David Binderman

Hello there,

Static analyser cppcheck says:

linux-6.15-rc2/drivers/xen/balloon.c:346:24: style: int result is assigned to 
long variable. If the variable is long to avoid loss of information, then you 
have loss of information. [truncLongCastAssignment]

Source code is

unsigned long i, size = (1 << order);

Maybe better code:

unsigned long i, size = (1UL << order);

Regards

David Binderman

Re: [PATCH v3 08/16] x86/hyperlaunch: Add helpers to locate multiboot modules

2025-04-14 Thread Alejandro Vallejo

On Mon Apr 14, 2025 at 4:05 PM BST, Jan Beulich wrote:
> On 14.04.2025 15:37, Alejandro Vallejo wrote:
>> On Thu Apr 10, 2025 at 11:42 AM BST, Jan Beulich wrote:
>>> On 08.04.2025 18:07, Alejandro Vallejo wrote:
 +/*
 + * Locate a multiboot module given its node offset in the FDT.
 + *
 + * The module location may be given via either FDT property:
 + * * reg = 
 + * * Mutates `bi` to append the module.
 + * * module-index = 
 + * * Leaves `bi` unchanged.
 + *
 + * @param fdt   Pointer to the full FDT.
 + * @param node  Offset for the module node.
 + * @param address_cells Number of 4-octet cells that make up an "address".
 + * @param size_cellsNumber of 4-octet cells that make up a "size".
 + * @param bi[inout] Xen's representation of the boot parameters.
 + * @return  -EINVAL on malformed nodes, otherwise
 + *  index inside `bi->mods`
 + */
 +int __init fdt_read_multiboot_module(const void *fdt, int node,
 + int address_cells, int size_cells,
 + struct boot_info *bi)
>>>
>>> Functions without callers and non-static ones without declarations are
>>> disliked by Misra.
>> 
>> Can't do much about it if I want them to stand alone in a single patch.
>> Otherwise the following ones become quite unwieldy to look at. All I can
>> say is that this function becomes static and with a caller on the next
>> patch.
>
> Which means you need to touch this again anyway. Perhaps we need a Misra
> deviation for __maybe_unused functions / data, in which case you could
> use that here and strip it along with making the function static. Cc-ing
> Bugseng folks.

It's a transient violation, sure. Do we care about transient MISRA
violations though? I understand the importance of bisectability, but
AUIU MISRA compliance matters to the extent that that the tip is
compliant rather than the intermediate steps?

Another option would be to fold them this patch and the next together
after both get their R-by. As I said, I assumed you'd rather see them in
isolation for purposes of review.

>
 +/* Otherwise location given as a `reg` property. */
 +prop = fdt_get_property(fdt, node, "reg", NULL);
 +
 +if ( !prop )
 +{
 +printk("  No location for multiboot,module\n");
 +return -EINVAL;
 +}
 +if ( fdt_get_property(fdt, node, "module-index", NULL) )
 +{
 +printk("  Location of multiboot,module defined multiple times\n");
 +return -EINVAL;
 +}
 +
 +ret = read_fdt_prop_as_reg(prop, address_cells, size_cells, &addr, 
 &size);
 +
 +if ( ret < 0 )
 +{
 +printk("  Failed reading reg for multiboot,module\n");
 +return -EINVAL;
 +}
 +
 +idx = bi->nr_modules + 1;
>>>
>>> This at least looks like an off-by-one. If the addition of 1 is really
>>> intended, I think it needs commenting on.
>> 
>> Seems to be, yes. The underlying array is a bit bizarre. It's sizes as
>> MAX_NR_BOOTMODS + 1, with the first one being the DTB itself. I guess
>> the intent was to take it into account, but bi->nr_modules is
>> initialised to the number of multiboot modules, so it SHOULD be already
>> taking it into account.
>> 
>> Also, the logic for bounds checking seems... off (because of the + 1 I
>> mentioned before). Or at least confusing, so I've moved to using
>> ARRAY_SIZE(bi->mods) rather than explicitly comparing against
>> MAX_NR_BOOTMODS.
>> 
>> The array is MAX_NR_BOOTMODS + 1 in length, so it's just more cognitive
>> load than I'm comfortable with.
>
> If I'm not mistaken the +1 is inherited from the modules array we had in
> the past, where we wanted 1 extra slot for Xen itself. Hence before you
> move to using ARRAY_SIZE() everywhere it needs to really be clear what
> the +1 here is used for.

Ew.  Ok, just looked at the code in multiboot_fill_boot_info and indeed
the arrangement is for all multiboot modules to be in front, and Xen to
be appended. But bi->nr_modules only lists multiboot modules, so
increasing that value is therefore not enough (or
next_boot_module_index() would fail).

I need to have a proper read on how this is all stitched together.  I
may simply swap BOOTMOD_XEN with the next entry on append. Though my
preference would be to _not_ have Xen as part of the module list to
begin with. Before boot_info that was probably a place as good as any,
but this would be much better off in a dedicated field.

I don't see much in terms of usage though. Why is it being added at all?

Cheers,
Alejandro

Re: [PATCH v3 6/6] CI: Include microcode for x86 hardware jobs

2025-04-14 Thread Andrew Cooper

On 14/04/2025 6:55 pm, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 14, 2025 at 06:47:07PM +0100, Andrew Cooper wrote:
>> On 14/04/2025 6:45 pm, Anthony PERARD wrote:
>>> On Mon, Apr 14, 2025 at 12:09:03PM +0100, Andrew Cooper wrote:
 diff --git a/automation/gitlab-ci/build.yaml 
 b/automation/gitlab-ci/build.yaml
 index 1b82b359d01f..ac5367874526 100644
 --- a/automation/gitlab-ci/build.yaml
 +++ b/automation/gitlab-ci/build.yaml
 @@ -306,6 +306,7 @@ alpine-3.18-gcc-debug:
CONFIG_ARGO=y
CONFIG_UBSAN=y
CONFIG_UBSAN_FATAL=y
 +  CONFIG_UCODE_SCAN_DEFAULT=y
>>> Is there a change
>> DYM "chance" ?
>>
>>>  that this patch series gets backported? Because that
>>> new Kconfig option won't exist.
>> Yes, I do intend to backport this whole series in due course, and yes,
>> I'm aware.
> A more backport-friendly way would be add ucode=scan to xen cmdline.

Yeah, but they're too long already IMO.

Needing to override defaults to make our CI system useful is a good hint
that the defaults are wrong.

Same for console_timestamps, which isn't even deployed consistently
across the testing.

~Andrew

Re: [PATCH v3 12/16] x86/hyperlaunch: add domain id parsing to domain config

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:15 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Introduce the ability to specify the desired domain id for the domain
>> definition. The domain id will be populated in the domid property of the
>> domain
>> node in the device tree configuration.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> ---
>> v3:
>> * Remove ramdisk parsing
>> * Add missing xen/errno.h include
>> ---
>> xen/arch/x86/domain-builder/fdt.c | 39 -
>> xen/arch/x86/setup.c | 5 ++--
>> xen/include/xen/libfdt/libfdt-xen.h | 11 
>> 3 files changed, 52 insertions(+), 3 deletions(-)
>> 
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index 0f5fd01557..4c6aafe195 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -8,6 +8,7 @@
>> #include 
>> 
>> 
>> #include 
>> 
>> +#include 
>> 
>> #include 
>> 
>> #include 
>> 
>> 
>> @@ -158,12 +159,42 @@ int __init fdt_read_multiboot_module(const void *fdt, 
>> int node,
>> static int __init process_domain_node(
>> struct boot_info *bi, const void *fdt, int dom_node)
>> {
>> - int node;
>> + int node, property;
>> struct boot_domain *bd = &bi->domains[bi->nr_domains];
>> 
>> const char *name = fdt_get_name(fdt, dom_node, NULL) ?: "unknown";
>> int address_cells = fdt_address_cells(fdt, dom_node);
>> int size_cells = fdt_size_cells(fdt, dom_node);
>> 
>> + fdt_for_each_property_offset(property, fdt, dom_node)
>> + {
>> + const struct fdt_property *prop;
>> + const char prop_name;
>> + int name_len;
>> +
>> + prop = fdt_get_property_by_offset(fdt, property, NULL);
>> + if ( !prop )
>> + continue; / silently skip */
>> +
>> + prop_name = fdt_get_string(fdt, fdt32_to_cpu(prop->nameoff), &name_len);
>> 
>> +
>> + if ( strncmp(prop_name, "domid", name_len) == 0 )
>> + {
>> + uint32_t val = DOMID_INVALID;
>> + if ( fdt_prop_as_u32(prop, &val) != 0 )
>> + {
>> + printk(" failed processing domain id for domain %s\n", name);
>
> Add XENLOG_ERR ?

Yes, and...

>
>> + return -EINVAL;
>> + }
>> + if ( val >= DOMID_FIRST_RESERVED )
>> 
>> + {
>> + printk(" invalid domain id for domain %s\n", name);
>
> Add XENLOG_ERR ?

... yes.

>
>> + return -EINVAL;
>> + }
>> + bd->domid = (domid_t)val;
>> 
>> + printk(" domid: %d\n", bd->domid);
>> 
>> + }
>> + }
>> +
>> fdt_for_each_subnode(node, fdt, dom_node)
>> {
>> if ( fdt_node_check_compatible(fdt, node, "multiboot,kernel") == 0 )
>> @@ -233,6 +264,12 @@ static int __init process_domain_node(
>> return -ENODATA;
>> }
>> 
>> + if ( bd->domid == DOMID_INVALID )
>> 
>> + bd->domid = get_initial_domain_id();
>> 
>> + else if ( bd->domid != get_initial_domain_id() )
>> 
>> + printk(XENLOG_WARNING
>> + "WARN: Booting without initial domid not supported.\n");
>
> Drop WARN since the log message is XENLOG_WARNING level already?

As mentioned elsewhere, the point of those prefixes are to be readable.

Though I'm starting to get urges to rewrite many of this error handlers
as asserts, on the basis that "why do we think it's ok to boot with
malformed DTBs"? A safe system that doesn't boot is more helpful than an
unsafe one that boots everything except a critical component for you to
find later on.

Cheers,
Alejandro

Re: [PATCH v3 16/16] x86/hyperlaunch: add capabilities to boot domain

2025-04-14 Thread Alejandro Vallejo

On Wed Apr 9, 2025 at 11:39 PM BST, Denis Mukhin wrote:
> On Tuesday, April 8th, 2025 at 9:07 AM, Alejandro Vallejo  
> wrote:
>
>> 
>> 
>> From: "Daniel P. Smith" dpsm...@apertussolutions.com
>> 
>> 
>> Introduce the ability to assign capabilities to a domain via its definition 
>> in
>> device tree. The first capability enabled to select is the control domain
>> capability. The capability property is a bitfield in both the device tree and
>> `struct boot_domain`.
>> 
>> Signed-off-by: Daniel P. Smith dpsm...@apertussolutions.com
>> 
>> Reviewed-by: Jason Andryuk jason.andr...@amd.com
>> 
>> Signed-off-by: Jason Andryuk jason.andr...@amd.com
>> 
>> ---
>> xen/arch/x86/domain-builder/core.c | 1 +
>> xen/arch/x86/domain-builder/fdt.c | 12 
>> xen/arch/x86/include/asm/boot-domain.h | 4 
>> xen/arch/x86/setup.c | 6 +-
>> 4 files changed, 22 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/arch/x86/domain-builder/core.c 
>> b/xen/arch/x86/domain-builder/core.c
>> index 510a74a675..6ab4e6fe53 100644
>> --- a/xen/arch/x86/domain-builder/core.c
>> +++ b/xen/arch/x86/domain-builder/core.c
>> @@ -96,6 +96,7 @@ void __init builder_init(struct boot_info *bi)
>> i = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
>> bi->mods[i].type = BOOTMOD_KERNEL;
>> 
>> bi->domains[0].kernel = &bi->mods[i];
>> 
>> + bi->domains[0].capabilities |= BUILD_CAPS_CONTROL;
>> 
>> bi->nr_domains = 1;
>> 
>> }
>> }
>> diff --git a/xen/arch/x86/domain-builder/fdt.c 
>> b/xen/arch/x86/domain-builder/fdt.c
>> index 5fcb767bdd..dbfbcffb0a 100644
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -257,6 +257,18 @@ static int __init process_domain_node(
>> bd->max_vcpus = val;
>> 
>> printk(" max vcpus: %d\n", bd->max_vcpus);
>> 
>> }
>> + else if ( strncmp(prop_name, "capabilities", name_len) == 0 )
>> + {
>> + if ( fdt_prop_as_u32(prop, &bd->capabilities) != 0 )
>> 
>> + {
>> + printk(" failed processing domain id for domain %s\n", name);
>
> Suggest adding XENLOG_ERR to the error message.

Yes, and the message itself seems bogus. The dangers of copy-paste...

Will fix both.

>
>> + return -EINVAL;
>> + }
>> + printk(" caps: ");
>> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
>> 
>> + printk("c");
>
> Perhaps wrap string generation into a separate function?
>
> That will help if the number of capabilities will grow over time
> and if there will be a need to use string representation somewhere else
> in the code.
>
> Thoughts?

If/when such other code appears I'm happy to unify them, but until then
I'd rather reduce indirection if possible and keep it inlined.

Cheers,
Alejandro

[XEN PATCH] tools/tests: Fix newly introduced Makefile

2025-04-14 Thread Anthony PERARD

From: Anthony PERARD 

Fix few issue with this new directory:
- clean generated files
- and ignore those generated files
- include the dependency files generated by `gcc`.
- rework prerequisites:
  "test-rangeset.o" also needs the generated files "list.h" and
  "rangeset.h". Technically, both only needs "harness.h" which needs
  the generated headers, but that's a bit simpler and the previous
  point will add the dependency on "harness.h" automatically.

This last point fix an issue where `make` might decide to build
"test-rangeset.o" before the other files are ready.

Fixes: 7bf777b42cad ("tootls/tests: introduce unit tests for rangesets")
Signed-off-by: Anthony PERARD 
---

Make doesn't needs the *.h to generated the .c. So removing that
prerequisite means make can generate all 3 at the same time.
---
 tools/tests/rangeset/.gitignore | 4 
 tools/tests/rangeset/Makefile   | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)
 create mode 100644 tools/tests/rangeset/.gitignore

diff --git a/tools/tests/rangeset/.gitignore b/tools/tests/rangeset/.gitignore
new file mode 100644
index 00..cdeb778535
--- /dev/null
+++ b/tools/tests/rangeset/.gitignore
@@ -0,0 +1,4 @@
+/list.h
+/rangeset.c
+/rangeset.h
+/test-rangeset
diff --git a/tools/tests/rangeset/Makefile b/tools/tests/rangeset/Makefile
index 70076eff34..3dafcbd054 100644
--- a/tools/tests/rangeset/Makefile
+++ b/tools/tests/rangeset/Makefile
@@ -12,7 +12,7 @@ run: $(TARGET)
 
 .PHONY: clean
 clean:
-   $(RM) -- *.o $(TARGET) $(DEPS_RM)
+   $(RM) -- *.o $(TARGET) $(DEPS_RM) list.h rangeset.h rangeset.c
 
 .PHONY: distclean
 distclean: clean
@@ -32,7 +32,7 @@ rangeset.h: $(XEN_ROOT)/xen/include/xen/rangeset.h
 list.h rangeset.h:
sed -e '/#include/d' <$< >$@
 
-rangeset.c: $(XEN_ROOT)/xen/common/rangeset.c list.h rangeset.h
+rangeset.c: $(XEN_ROOT)/xen/common/rangeset.c
# Remove includes and add the test harness header
sed -e '/#include/d' -e '1s/^/#include "harness.h"/' <$< >$@
 
@@ -42,5 +42,9 @@ CFLAGS += $(CFLAGS_xeninclude)
 
 LDFLAGS += $(APPEND_LDFLAGS)
 
+test-rangeset.o rangeset.o: list.h rangeset.h
+
 test-rangeset: rangeset.o test-rangeset.o
$(CC) $^ -o $@ $(LDFLAGS)
+
+-include $(DEPS_INCLUDE)
-- 
Anthony PERARD

Re: [PATCH v2 0/5] Fix lazy mmu mode

2025-04-14 Thread Ryan Roberts

On 10/04/2025 17:07, Alexander Gordeev wrote:
> On Mon, Mar 03, 2025 at 02:15:34PM +, Ryan Roberts wrote:
> 
> Hi Ryan,
> 
>> I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As 
>> part
>> of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc 
>> table
>> walkers. While lazy mmu mode is already used for kernel mappings in a few
>> places, this will extend it's use significantly.
>>
>> Having reviewed the existing lazy mmu implementations in powerpc, sparc and 
>> x86,
>> it looks like there are a bunch of bugs, some of which may be more likely to
>> trigger once I extend the use of lazy mmu.
> 
> Do you have any idea about generic code issues as result of not adhering to
> the originally stated requirement:
> 
>   /*
>...
>* the PTE updates which happen during this window.  Note that using this
>* interface requires that read hazards be removed from the code.  A read
>* hazard could result in the direct mode hypervisor case, since the actual
>* write to the page tables may not yet have taken place, so reads though
>* a raw PTE pointer after it has been modified are not guaranteed to be
>* up to date.
>...
>*/
> 
> I tried to follow few code paths and at least this one does not look so good:
> 
> copy_pte_range(..., src_pte, ...)
>   ret = copy_nonpresent_pte(..., src_pte, ...)
>   try_restore_exclusive_pte(..., src_pte, ...)// 
> is_device_exclusive_entry(entry)
>   restore_exclusive_pte(..., ptep, ...)
>   set_pte_at(..., ptep, ...)
>   set_pte(ptep, pte); // save in lazy 
> mmu mode
> 
>   // ret == -ENOENT
> 
>   ptent = ptep_get(src_pte);  // lazy mmu 
> save is not observed
>   ret = copy_present_ptes(..., ptent, ...);   // wrong ptent 
> used
> 
> I am not aware whether the effort to "read hazards be removed from the code"
> has ever been made and the generic code is safe in this regard.
> 
> What is your take on this?

Hmm, that looks like a bug to me, at least based on the stated requirements.
Although this is not a "read through a raw PTE *pointer*", it is a ptep_get().
The arch code can override that so I guess it has an opportunity to flush. But I
don't think any arches are currently doing that.

Probably the simplest fix is to add arch_flush_lazy_mmu_mode() before the
ptep_get()?

It won't be a problem in practice for arm64, since the pgtables are always
updated immediately. I just want to use these hooks to defer/batch barriers in
certain cases.

And this is a pre-existing issue for the arches that use lazy mmu with
device-exclusive mappings, which my extending lazy mmu into vmalloc won't
exacerbate.

Would you be willing/able to submit a fix?

Thanks,
Ryan


> 
> Thanks!

Re: [PATCH v3 16/16] x86/hyperlaunch: add capabilities to boot domain

2025-04-14 Thread Alejandro Vallejo

On Thu Apr 10, 2025 at 1:18 PM BST, Jan Beulich wrote:
> On 08.04.2025 18:07, Alejandro Vallejo wrote:
>> From: "Daniel P. Smith" 
>> 
>> Introduce the ability to assign capabilities to a domain via its definition 
>> in
>> device tree. The first capability enabled to select is the control domain
>> capability. The capability property is a bitfield in both the device tree and
>> `struct boot_domain`.
>> 
>> Signed-off-by: Daniel P. Smith 
>> Reviewed-by: Jason Andryuk 
>> Signed-off-by: Jason Andryuk 
>
> The R-b feels kind of redundant with the subsequent S-o-b.

I'll drop it.

>
>> --- a/xen/arch/x86/domain-builder/fdt.c
>> +++ b/xen/arch/x86/domain-builder/fdt.c
>> @@ -257,6 +257,18 @@ static int __init process_domain_node(
>>  bd->max_vcpus = val;
>>  printk("  max vcpus: %d\n", bd->max_vcpus);
>>  }
>> +else if ( strncmp(prop_name, "capabilities", name_len) == 0 )
>> +{
>> +if ( fdt_prop_as_u32(prop, &bd->capabilities) != 0 )
>> +{
>> +printk("  failed processing domain id for domain %s\n", 
>> name);
>> +return -EINVAL;
>> +}
>> +printk("  caps: ");
>> +if ( bd->capabilities & BUILD_CAPS_CONTROL )
>> +printk("c");
>> +printk("\n");
>> +}
>
> Like for the other patch: What about other bits being set in the value read?

I take it that the non-worded suggestion is to have a mask of reserved
bits for each case and check they are not set (giving a warning if they are)?

>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1006,6 +1006,7 @@ static struct domain *__init create_dom0(struct 
>> boot_info *bi)
>>  {
>>  char *cmdline = NULL;
>>  size_t cmdline_size;
>> +unsigned int create_flags = 0;
>>  struct xen_domctl_createdomain dom0_cfg = {
>>  .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
>>  .max_evtchn_port = -1,
>> @@ -1037,7 +1038,10 @@ static struct domain *__init create_dom0(struct 
>> boot_info *bi)
>>  if ( bd->domid == DOMID_INVALID )
>>  /* Create initial domain.  Not d0 for pvshim. */
>>  bd->domid = get_initial_domain_id();
>> -d = domain_create(bd->domid, &dom0_cfg, pv_shim ? 0 : CDF_privileged);
>> +if ( bd->capabilities & BUILD_CAPS_CONTROL )
>> +create_flags |= CDF_privileged;
>
> Seeing that builder_init() in the non-DT case sets the new bit 
> unconditionally,
> isn't the shim's only domain suddenly getting CDF_privileged set this way? Oh,
> no, you then ...
>
>> +d = domain_create(bd->domid, &dom0_cfg,
>> +  pv_shim ? 0 : create_flags);
>
> ... hide the flag here. Any reason to have the intermediate variable in the
> first place

Well, the logic would end up fairly convoluted otherwise. As things
stand this can be encoded in an if-else fashion with 2 calls, but
there's 2 capability flags coming that need integrating together.

This is just avoiding further code motion down the line.

> (can't resist: when there's already a wall of local variables here)?

Heh :). Point taken.

Cheers,
Alejandro

[PATCH v4 1/3] drivers: Change amd_iommu struct to contain pci_sbdf_t, simplify code

2025-04-14 Thread Andrii Sultanov

From: Andrii Sultanov 

Following on from 250d87dc3ff9 ("x86/msi: Change __msi_set_enable() to
take pci_sbdf_t"), make struct amd_iommu contain pci_sbdf_t directly
instead of specifying seg+bdf separately and regenerating sbdf_t from them,
which simplifies code.

Bloat-o-meter reports:
add/remove: 0/0 grow/shrink: 4/13 up/down: 121/-377 (-256)
Function old new   delta
_einittext 22028   22092 +64
amd_iommu_prepare853 897 +44
__mon_lengths   29282936  +8
_invalidate_all_devices  133 138  +5
_hvm_dpci_msi_eoi157 155  -2
build_info   752 744  -8
amd_iommu_add_device 856 844 -12
amd_iommu_msi_enable  33  20 -13
update_intremap_entry_from_msi_msg   879 859 -20
amd_iommu_msi_msg_update_ire 472 448 -24
send_iommu_command   251 224 -27
amd_iommu_get_supported_ivhd_type 86  54 -32
amd_iommu_detect_one_acpi918 886 -32
iterate_ivrs_mappings169 129 -40
flush_command_buffer 460 417 -43
set_iommu_interrupt_handler  421 377 -44
enable_iommu17451665 -80

Resolves: https://gitlab.com/xen-project/xen/-/issues/198

Reported-by: Andrew Cooper 
Signed-off-by: Andrii Sultanov 

---
Changes in V4:
* Dropped references to the order of seg/bdf in the commit message
* Dropped unnecessary detail from the commit message
* Reverted to a macro usage in one case where it was mistakenly dropped
* Folded several separate seg+bdf comparisons into a single one between
  sbdf_t, folded separate assignments with a macro.
* More code size improvements with the changes, so I've refreshed the
  bloat-o-meter report

Changes in V3:
* Dropped the union with seg+bdf/pci_sbdf_t to avoid aliasing, renamed
  all users appropriately

Changes in V2:
* Split single commit into several patches
* Added the commit title of the referenced patch
* Dropped brackets around &(iommu->sbdf) and &(sbdf)
---
 xen/drivers/passthrough/amd/iommu.h |  4 +--
 xen/drivers/passthrough/amd/iommu_acpi.c| 16 +-
 xen/drivers/passthrough/amd/iommu_cmd.c |  8 ++---
 xen/drivers/passthrough/amd/iommu_detect.c  | 18 +--
 xen/drivers/passthrough/amd/iommu_init.c| 35 ++---
 xen/drivers/passthrough/amd/iommu_intr.c| 29 -
 xen/drivers/passthrough/amd/iommu_map.c |  4 +--
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 22 ++---
 8 files changed, 67 insertions(+), 69 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu.h 
b/xen/drivers/passthrough/amd/iommu.h
index 00e81b4b2a..ba541f7943 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -77,8 +77,8 @@ struct amd_iommu {
 struct list_head list;
 spinlock_t lock; /* protect iommu */
 
-u16 seg;
-u16 bdf;
+pci_sbdf_t sbdf;
+
 struct msi_desc msi;
 
 u16 cap_offset;
diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c 
b/xen/drivers/passthrough/amd/iommu_acpi.c
index 5bdbfb5ba8..025d9be40f 100644
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -58,7 +58,7 @@ static void __init add_ivrs_mapping_entry(
 uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
 bool alloc_irt, struct amd_iommu *iommu)
 {
-struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
+struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->sbdf.seg);
 
 ASSERT( ivrs_mappings != NULL );
 
@@ -70,7 +70,7 @@ static void __init add_ivrs_mapping_entry(
 ivrs_mappings[bdf].device_flags = flags;
 
 /* Don't map an IOMMU by itself. */
-if ( iommu->bdf == bdf )
+if ( iommu->sbdf.bdf == bdf )
 return;
 
 /* Allocate interrupt remapping table if needed. */
@@ -96,7 +96,7 @@ static void __init add_ivrs_mapping_entry(
 
 if ( !ivrs_mappings[alias_id].intremap_table )
 panic("No memory for %pp's IRT\n",
-  &PCI_SBDF(iommu->seg, alias_id));
+  &PCI_SBDF(iommu->sbdf.seg, alias_id));
 }
 }
 
@@ -112,7 +112,7 @@ static struct amd_iommu * __init find_iommu_from_bdf_cap(
 struct amd_iommu *iommu;
 
 for_each_amd_iommu ( iommu )
-if ( (iommu->seg == seg) && (iommu->bdf == bdf) &&
+if ( (iommu->sbdf.seg == seg) && (iommu->sbdf.bdf == bdf) &&
  (iommu->cap_offset == cap_offset) )
 return iommu;
 
@@ -297,13 +297,13 @@ static int __init register_range_for_iommu_devices(
 /* reserve unity-mapped page entries for device

Re: [PATCH v2 2/7] Overhaul how Argo is built and packged

2025-04-14 Thread Marek Marczykowski-Górecki

On Mon, Apr 14, 2025 at 11:18:38AM +0100, Andrew Cooper wrote:
> --- a/scripts/build-linux.sh
> +++ b/scripts/build-linux.sh
> @@ -8,7 +8,7 @@ fi
>  set -ex -o pipefail
>  
>  WORKDIR="${PWD}"
> -COPYDIR="${WORKDIR}/binaries/"
> +COPYDIR="${WORKDIR}/binaries"

Is this change intentional? It has worse failure mode if "binaries" dir
wouldn't exist for some reason...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

[PATCH v2 7/7] Package microcode for the x86 hardware runners

2025-04-14 Thread Andrew Cooper

They are all out of date, to different degrees.

Install jq into the x86_64 build container so we can parse the Github latest
release information in an acceptable way.

The resulting archive must be uncompressed, in order to work during early
boot.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jason Andryuk 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Michal Orzel 
CC: Doug Goldstein 
CC: Marek Marczykowski-Górecki 
---
 .gitlab-ci.yml |  4 +++
 images/alpine/3.18-x86_64-build.dockerfile |  3 ++
 scripts/x86-microcode.sh   | 42 ++
 3 files changed, 49 insertions(+)
 create mode 100755 scripts/x86-microcode.sh

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index d70ddd99e529..74335363d5ed 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -64,6 +64,10 @@ linux-6.6.56-x86_64:
 ARGO_SHA: "705a7a8a624b42e13e655d3042059b8a85cdf6a3"
 ARGOEXEC_SHA: "d900429f6640acc6f68a3d3a4c945d7da60625d8"
 
+microcode-x86:
+  extends: .x86_64-artifacts
+  script: ./scripts/x86-microcode.sh
+
 #
 # The jobs below here are legacy and being phased out.
 #
diff --git a/images/alpine/3.18-x86_64-build.dockerfile 
b/images/alpine/3.18-x86_64-build.dockerfile
index eac0cda4fed3..c4ff30e1f138 100644
--- a/images/alpine/3.18-x86_64-build.dockerfile
+++ b/images/alpine/3.18-x86_64-build.dockerfile
@@ -27,6 +27,9 @@ RUN  intel-latest.json
+TARBALL_URL="$(jq -r .tarball_url intel-latest.json)"
+curl -fsSL "${TARBALL_URL}" > intel-latest.tar
+tar xf intel-latest.tar --strip-components=1
+
+(
+cd intel-ucode
+cat 06-97-02 # adl-*
+cat 06-8e-09 # kbl-*
+) > "${UCODEDIR}/GenuineIntel.bin"
+
+#
+# AMD microcode comes from linux-firmware
+#
+curl -fsSLO 
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amd-ucode/microcode_amd_fam17h.bin
+curl -fsSLO 
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amd-ucode/microcode_amd_fam19h.bin
+
+(
+cat microcode_amd_fam17h.bin # zen2-*, xilinux-*-x86_64-*
+cat microcode_amd_fam19h.bin # zen3p-*
+) > "${UCODEDIR}/AuthenticAMD.bin"
+
+# Package everything up.  It must be uncompressed
+cd "${ROOTDIR}"
+find . | cpio -R 0:0 -H newc -o > "${COPYDIR}/ucode.cpio"
+
+# Print the contents for the build log
+cpio -tv < "${COPYDIR}/ucode.cpio"
-- 
2.39.5

Re: [PATCH v2 2/7] Overhaul how Argo is built and packged

2025-04-14 Thread Andrew Cooper

On 14/04/2025 11:35 am, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 14, 2025 at 11:18:38AM +0100, Andrew Cooper wrote:
>> --- a/scripts/build-linux.sh
>> +++ b/scripts/build-linux.sh
>> @@ -8,7 +8,7 @@ fi
>>  set -ex -o pipefail
>>  
>>  WORKDIR="${PWD}"
>> -COPYDIR="${WORKDIR}/binaries/"
>> +COPYDIR="${WORKDIR}/binaries"
> Is this change intentional? It has worse failure mode if "binaries" dir
> wouldn't exist for some reason...

Yes it is intentional.  It causes problems when we derive new variables
from COPYDIR.

binaries/ always exists.  It's in the base repo.

~Andrew

Re: [PATCH v2 0/2] Add support for MSI injection on Arm

2025-04-14 Thread Julien Grall


Hi Mykyta,

On 14/04/2025 18:51, Mykyta Poturai wrote:

This series adds the base support for MSI injection on Arm. This is
needed to streamline virtio-pci interrupt triggering.

With this patches, MSIs can be triggered in guests by issuing the new
DM op, inject_msi2. This op is similar to inject_msi, but it allows
to specify the source id of the MSI.

We chose the approach of adding a new DM op instead of using the pad
field of inject_msi because we have no clear way of distinguishing
between set and unset pad fields. New implementations also adds flags
field to clearly specify if the SBDF is set.

Patches were tested on QEMU with


[...]


patches for ITS support for DomUs applied.


This means this series is unusable without external patches. Given this 
is adding a new DM operations, I think it would be more sensible to have 
the vITS support merged first. Then we can look at merging this series.


Cheers,

--
Julien Grall

[PATCH RFC 7/6] CI: Adjust how domU is packaged in dom0

2025-04-14 Thread Andrew Cooper

Package domU in /boot for dom0 and insert into the uncompressed part of dom0's
rootfs, rather than recompressing it as part of the overlay.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Michal Orzel 
CC: Doug Goldstein 
CC: Marek Marczykowski-Górecki 

A little RFC.  It wants extending to the other tests too.
---
 automation/scripts/qubes-x86-64.sh | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index 1f90e7002c73..7ce077dfeaee 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -185,10 +185,22 @@ Kernel \r on an \m (\l)
 find . | cpio -H newc -o | gzip >> ../binaries/domU-rootfs.cpio.gz
 cd ..
 rm -rf rootfs
+
+# Package domU kernel+rootfs in /boot for dom0 (uncompressed)
+mkdir -p rootfs/boot
+cd rootfs
+cp ../binaries/bzImage boot/vmlinuz
+cp ../binaries/domU-rootfs.cpio.gz boot/
+find . | cpio -H newc -o > ../binaries/domU-in-dom0.cpio
+cd ..
+rm -rf rootfs
 fi
 
 # Dom0 rootfs
 cp binaries/ucode.cpio binaries/dom0-rootfs.cpio.gz
+if [ -e binaries/domU-in-dom0.cpio ]; then
+cat binaries/domU-in-dom0.cpio >> binaries/dom0-rootfs.cpio.gz
+fi
 cat binaries/rootfs.cpio.gz >> binaries/dom0-rootfs.cpio.gz
 cat binaries/xen-tools.cpio.gz >> binaries/dom0-rootfs.cpio.gz
 
@@ -236,10 +248,6 @@ mkdir -p etc/default
 echo "XENCONSOLED_TRACE=all" >> etc/default/xencommons
 echo "QEMU_XEN=/bin/false" >> etc/default/xencommons
 mkdir -p var/log/xen/console
-cp ../binaries/bzImage boot/vmlinuz
-if [ -n "$domU_check" ]; then
-cp ../binaries/domU-rootfs.cpio.gz boot/initrd-domU
-fi
 find . | cpio -H newc -o | gzip >> ../binaries/dom0-rootfs.cpio.gz
 cd ..
 
-- 
2.39.5

Re: [PATCH v6 1/3] xen/arm: Move some of the functions to common file

2025-04-14 Thread Ayan Kumar Halder


Hi,

I will keep Michal's R-b with one small change.

On 11/04/2025 12:04, Ayan Kumar Halder wrote:

regions.inc is added to hold the common earlyboot MPU regions configuration
between arm64 and arm32.

prepare_xen_region, fail_insufficient_regions() will be used by both arm32 and
arm64. Thus, they have been moved to regions.inc.

*_PRBAR are moved to arm64/sysregs.h.
*_PRLAR are moved to regions.inc as they are common between arm32 and arm64.

Introduce WRITE_SYSREG_ASM to write to the system registers from regions.inc.

Signed-off-by: Ayan Kumar Halder 
Reviewed-by: Luca Fancellu 
---

Changes from

v1 -

1. enable_mpu() now sets HMAIR{0,1} registers. This is similar to what is
being done in enable_mmu(). All the mm related configurations happen in this
function.

2. Fixed some typos.

v2 -
1. Extracted the arm64 head.S functions/macros in a common file.

v3 -
1. Moved *_PRLAR are moved to prepare_xen_region.inc

2. enable_boot_cpu_mm() is preserved in mpu/head.S.

3. STORE_SYSREG is renamed as WRITE_SYSREG_ASM()

4. LOAD_SYSREG is removed.

5. No need to save/restore lr in enable_boot_cpu_mm(). IOW, keep it as it was
in the original code.

v4 -
1. Rename prepare_xen_region.inc to common.inc

2. enable_secondary_cpu_mm() is moved back to mpu/head.S.

v5 -
1. Rename common.inc to regions.inc.

2. WRITE_SYSREG_ASM() in enclosed within #ifdef __ASSEMBLY__.

  xen/arch/arm/arm64/mpu/head.S| 78 +--
  xen/arch/arm/include/asm/arm64/sysregs.h | 13 
  xen/arch/arm/include/asm/mpu/regions.inc | 79 
  3 files changed, 93 insertions(+), 77 deletions(-)
  create mode 100644 xen/arch/arm/include/asm/mpu/regions.inc

diff --git a/xen/arch/arm/arm64/mpu/head.S b/xen/arch/arm/arm64/mpu/head.S
index ed01993d85..6d336cafbb 100644
--- a/xen/arch/arm/arm64/mpu/head.S
+++ b/xen/arch/arm/arm64/mpu/head.S
@@ -3,83 +3,7 @@
   * Start-of-day code for an Armv8-R MPU system.
   */
  
-#include 

-#include 
-
-/* Backgroud region enable/disable */
-#define SCTLR_ELx_BRBIT(17, UL)
-
-#define REGION_TEXT_PRBAR   0x38/* SH=11 AP=10 XN=00 */
-#define REGION_RO_PRBAR 0x3A/* SH=11 AP=10 XN=10 */
-#define REGION_DATA_PRBAR   0x32/* SH=11 AP=00 XN=10 */
-#define REGION_DEVICE_PRBAR 0x22/* SH=10 AP=00 XN=10 */
-
-#define REGION_NORMAL_PRLAR 0x0f/* NS=0 ATTR=111 EN=1 */
-#define REGION_DEVICE_PRLAR 0x09/* NS=0 ATTR=100 EN=1 */
-
-/*
- * Macro to prepare and set a EL2 MPU memory region.
- * We will also create an according MPU memory region entry, which
- * is a structure of pr_t,  in table \prmap.
- *
- * sel: region selector
- * base:reg storing base address
- * limit:   reg storing limit address
- * prbar:   store computed PRBAR_EL2 value
- * prlar:   store computed PRLAR_EL2 value
- * maxcount:maximum number of EL2 regions supported
- * attr_prbar:  PRBAR_EL2-related memory attributes. If not specified it will 
be
- *  REGION_DATA_PRBAR
- * attr_prlar:  PRLAR_EL2-related memory attributes. If not specified it will 
be
- *  REGION_NORMAL_PRLAR
- *
- * Preserves \maxcount
- * Output:
- *  \sel: Next available region selector index.
- * Clobbers \base, \limit, \prbar, \prlar
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro prepare_xen_region, sel, base, limit, prbar, prlar, maxcount, 
attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR
-/* Check if the region is empty */
-cmp   \base, \limit
-beq   1f
-
-/* Check if the number of regions exceeded the count specified in 
MPUIR_EL2 */
-cmp   \sel, \maxcount
-bge   fail_insufficient_regions
-
-/* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
-and   \base, \base, #MPU_REGION_MASK
-mov   \prbar, #\attr_prbar
-orr   \prbar, \prbar, \base
-
-/* Limit address should be inclusive */
-sub   \limit, \limit, #1
-and   \limit, \limit, #MPU_REGION_MASK
-mov   \prlar, #\attr_prlar
-orr   \prlar, \prlar, \limit
-
-msr   PRSELR_EL2, \sel
-isb
-msr   PRBAR_EL2, \prbar
-msr   PRLAR_EL2, \prlar
-dsb   sy
-isb
-
-add   \sel, \sel, #1
-
-1:
-.endm
-
-/*
- * Failure caused due to insufficient MPU regions.
- */
-FUNC_LOCAL(fail_insufficient_regions)
-PRINT("- Selected MPU region is above the implemented number in MPUIR_EL2 
-\r\n")
-1:  wfe
-b   1b
-END(fail_insufficient_regions)
+#include 
  
  /*

   * Enable EL2 MPU and data cache
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h 
b/xen/arch/arm/include/asm/arm64/sysregs.h
index b593e4028b..dba0248c88 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -462,6 +462,17 @@
  #define ZCR_ELx_LEN_SIZE 9
  #define ZCR_ELx_LEN_MASK 0x1ff
  
+#define REGION_TEXT_PRBAR   0x38/* SH=11 AP=10 XN=00 */

+#define REGION_RO_PRBAR 0x3A/* SH=11 AP=10 XN=10 */
+

Re: [PATCH v3] xen/riscv: Increase XEN_VIRT_SIZE

2025-04-14 Thread Jan Beulich

On 14.04.2025 13:48, Oleksii Kurochko wrote:
> 
> On 4/10/25 10:48 AM, Jan Beulich wrote:
>> On 09.04.2025 21:01, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/mm.h
>>> +++ b/xen/arch/riscv/include/asm/mm.h
>>> @@ -9,6 +9,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>>   #include 
>>>   
>>>   #include 
>>> @@ -35,6 +36,11 @@ static inline void *maddr_to_virt(paddr_t ma)
>>>   return (void *)va;
>>>   }
>>>   
>>> +#define is_init_section(p) ({   \
>>> +char *p_ = (char *)(unsigned long)(p);  \
>>> +(p_ >= __init_begin) && (p_ < __init_end);  \
>>> +})
>> I think this wants to be put in xen/sections.h, next to where 
>> __init_{begin,end}
>> are declared. But first it wants making const-correct, to eliminate the 
>> potential
>> of it indirectly casting away const-ness from the incoming argument.
>>
>> (At some point related stuff wants moving from kernel.h to sections.h, I 
>> suppose.
>> And at that point they will all want to have const added.)
> 
> Sure, I'll change to 'const char *p_ = (const char*)(unsigned long)(p)'.

And hopefully without forgetting the blank ahead of the *.

Jan

[PATCH v2 1/2] arm: vgic: Add the ability to trigger MSIs from the Hypervisor

2025-04-14 Thread Mykyta Poturai

From: Mykyta Poturai 

Add the vgic_its_trigger_msi() function to the vgic interface. This
function allows to inject MSIs from the Hypervisor to the guest.
Which is useful for userspace PCI backend drivers.

Signed-off-by: Mykyta Poturai 
---
v1->v2:
* replace -1 with -ENOENT
* reduce guest memory access in vgic_its_trigger_msi
---
 xen/arch/arm/include/asm/vgic.h | 11 +++
 xen/arch/arm/vgic-v3-its.c  | 19 +++
 2 files changed, 30 insertions(+)

diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h
index e309dca1ad..3d8e3a8343 100644
--- a/xen/arch/arm/include/asm/vgic.h
+++ b/xen/arch/arm/include/asm/vgic.h
@@ -318,6 +318,17 @@ extern bool vgic_migrate_irq(struct vcpu *old, struct vcpu 
*new, unsigned int ir
 extern void vgic_check_inflight_irqs_pending(struct vcpu *v,
  unsigned int rank, uint32_t r);
 
+#ifdef CONFIG_HAS_ITS
+int vgic_its_trigger_msi(struct domain *d, paddr_t doorbell_address,
+u32 devid, u32 eventid);
+#else
+static inline int vgic_its_trigger_msi(struct domain *d, paddr_t 
doorbell_address,
+u32 devid, u32 eventid)
+{
+return -EOPNOTSUPP;
+}
+#endif /* CONFIG_HAS_ITS */
+
 #endif /* !CONFIG_NEW_VGIC */
 
 /*** Common VGIC functions used by Xen arch code /
diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index c65c1dbf52..be5bfe0d21 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -1484,6 +1484,25 @@ static int vgic_v3_its_init_virtual(struct domain *d, 
paddr_t guest_addr,
 return 0;
 }
 
+int vgic_its_trigger_msi(struct domain *d, paddr_t doorbell_address,
+u32 devid, u32 eventid)
+{
+struct pending_irq *pend;
+unsigned int vcpu_id;
+
+pend = gicv3_its_get_event_pending_irq(d,doorbell_address, devid, eventid);
+if ( !pend )
+return -ENOENT;
+
+vcpu_id = ACCESS_ONCE(pend->lpi_vcpu_id);
+if ( vcpu_id >= d->max_vcpus )
+  return -ENOENT;
+
+vgic_inject_irq(d, d->vcpu[vcpu_id], pend->irq, true);
+
+return 0;
+}
+
 unsigned int vgic_v3_its_count(const struct domain *d)
 {
 struct host_its *hw_its;
-- 
2.34.1

[PATCH v2 0/2] Add support for MSI injection on Arm

2025-04-14 Thread Mykyta Poturai

This series adds the base support for MSI injection on Arm. This is
needed to streamline virtio-pci interrupt triggering.

With this patches, MSIs can be triggered in guests by issuing the new
DM op, inject_msi2. This op is similar to inject_msi, but it allows
to specify the source id of the MSI.

We chose the approach of adding a new DM op instead of using the pad
field of inject_msi because we have no clear way of distinguishing
between set and unset pad fields. New implementations also adds flags
field to clearly specify if the SBDF is set.

Patches were tested on QEMU with QEMU virtio-pci backends, with 
virtio-pci patches and patches for ITS support for DomUs applied.

Branch with all relevant Xen patches:
https://github.com/Deedone/xen/tree/4.20-dev%2Bvirtio

Branch with all relevant QEMU patches:
https://github.com/Deedone/qemu/tree/virtio-msi2

Mykyta Poturai (2):
  arm: vgic: Add the ability to trigger MSIs from the Hypervisor
  xen/dm: arm: Introduce inject_msi2 DM op

 tools/include/xendevicemodel.h   | 14 ++
 tools/libs/devicemodel/core.c| 20 
 tools/libs/devicemodel/libxendevicemodel.map |  5 +
 xen/arch/arm/dm.c| 17 +
 xen/arch/arm/include/asm/vgic.h  | 11 +++
 xen/arch/arm/vgic-v3-its.c   | 19 +++
 xen/arch/x86/hvm/dm.c| 18 ++
 xen/include/public/hvm/dm_op.h   | 18 ++
 8 files changed, 122 insertions(+)

-- 
2.34.1

Re: [RFC] xen/x86: allow overlaps with non-RAM regions

2025-04-14 Thread Roger Pau Monné

On Fri, Apr 11, 2025 at 09:45:26AM -0400, Jason Andryuk wrote:
> On 2025-04-11 03:31, Roger Pau Monné wrote:
> > Thanks Jason for getting back, I'm intrigued by this issue :).
> > 
> > On Thu, Apr 10, 2025 at 04:55:54PM -0400, Jason Andryuk wrote:
> > > On 2025-04-04 06:28, Roger Pau Monné wrote:
> > > > On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
> > > > > On one Sapphire AMD x86 board, I see this:
> > > > > 
> > > > > 
> > > > > (XEN) [003943ca6ff2]  [f000, f7ff] 
> > > > > (reserved)
> > > > > (XEN) [0039460886d9]  [fd00, ] 
> > > > > (reserved)
> > > > > [...]
> > > > > (XEN) [4.612235] :02:00.0: not mapping BAR [fea00, fea03] 
> > > > > invalid position
> > > > > 
> > > > > 
> > > > > Linux boots fine on this platform but Linux as Dom0 on Xen does not.
> > > > > This is because the pci_check_bar->is_memory_hole check fails due to 
> > > > > the
> > > > > MMIO region overlapping with the EFI reserved region.
> > > > 
> > > > That's weird.  (Partially) the reason to not attempt to map such BAR
> > > > is that it should already be mapped, because at dom0 creation time all
> > > > reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
> > > > If that's not the case we should figure out why this reserved region
> > > > is not added to dom0 p2m as part of arch_iommu_hwdom_init().
> > > 
> > > Victor discovered these regions are type 11 EfiMemoryMappedIO, but they 
> > > get
> > > converted to e820 RESERVED.  The BAR points into it.
> > > 
> > > 0f000-0f7ff type=11 attr=8000100d
> > > 0fd00-0fedf type=11 attr=8000100d
> > > 0fee0-0fee00fff type=11 attr=8001
> > > 0fee01000-0 type=11 attr=8000100d
> > > 
> > > Xenia discovered Linux keeps small regions like this reserved, but lets
> > > larger ones (>= 256kb) become holes.  See the comment in Linux
> > > arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.
> > 
> > Right, but whatever Linux decides to do with the reserved regions
> > won't affect how Xen maps them into the p2m.  Anything that's reserved
> > in the e820 should end up identity mapped in the p2m for PVH dom0,
> > unless it's being exclusively used by Xen (see
> > dom0_setup_permissions() use of iomem_deny_access() to deny dom0
> > access to some MMIO regions).
> 
> Oh, my point was more that Baremetal Linux won't have reserved ranges in
> these regions, so there would not be any BAR conflicts.  Though I'm not sure
> if it checks.
> 
> If Xen removed them from the memory map, then pci_check_bar() ->
> is_memory_hole() would pass.

Yes, it would pass.  The underlying issue however is that such region
should already be mapped in the p2m, and hence accesses shouldn't
fault.

When building dom0:

(XEN) [7.943830] *** Building a PVH Dom0 ***
(XEN) [7.955960] d0: identity mappings for IOMMU:
(XEN) [7.965494]  [a0, ff] RW
(XEN) [7.974336]  [009bff, 009fff] RW
(XEN) [7.983172]  [0cabc9, 0cc14c] RW
(XEN) [7.992049]  [0cc389, 0cc389] RW
(XEN) [8.000890]  [0cc70a, 0cd1fe] RW
(XEN) [8.010065]  [0ce000, 0c] RW
(XEN) [8.018904]  [0fd000, 0fd2ff] RW
(XEN) [8.027745]  [0fd304, 0febff] RW
(XEN) [8.036584]  [0fec02, 0fedff] RW
(XEN) [8.045546]  [0fee01, 0f] RW
(XEN) [8.054519]  [80f340, 8501ff] RW

All the ranges listed here are added to the p2m, and hence the range
[0xfea00, 0xfea03] should be covered by:

(XEN) [8.027745]  [0fd304, 0febff] RW

The expectation is that those mappings are never removed from dom0
p2m.

> > > The description of EfiMemoryMappedIO is a little confusing, which is
> > > probably why its use in unclear.
> > > 
> > > ```
> > > Table 7.5 Memory Type Usage before ExitBootServices()
> > > EfiMemoryMappedIO
> > > 
> > > Used by system firmware to request that a memory-mapped IO region be 
> > > mapped
> > > by the OS to a virtual address so it can be accessed by EFI runtime
> > > services.
> > > 
> > > Table 7.6 Memory Type Usage after ExitBootServices()
> > > EfiMemoryMappedIO
> > > 
> > > This memory is not used by the OS. All system memory-mapped IO information
> > > should come from ACPI tables.
> > > ```
> > > 
> > > The two after ExitBootServices sentences seem contradictory.  I wonder if 
> > > it
> > > should be "Ignore this memory type - All system memory-mapped IO 
> > > information
> > > should come from ACPI tables".
> > 
> > Not very helpful indeed.  The description in "before
> > ExitBootServices()" seems more sensible to me: if the MMIO region is
> > used by runtime services Xen should ensure it's always mapped in the
> > dom0 p2m (which Xen should in principle already do).
> > 
> > > > Can you paste the dom0 build output when booted with `iommu=verbose
> > > > dom0=pvh,verbose`?
> > 
> > Would it be

Re: [PATCH 2/5] xen/io: provide helpers for multi size MMIO accesses

2025-04-14 Thread Jan Beulich

On 11.04.2025 12:54, Roger Pau Monne wrote:
> Several handlers have the same necessity of reading from an MMIO region
> using 1, 2, 4 or 8 bytes accesses.  So far this has been open-coded in the
> function itself.  Instead provide a new handler that encapsulates the
> accesses.
> 
> Since the added helpers are not architecture specific, introduce a new
> generic io.h header.

Except that ...

> --- /dev/null
> +++ b/xen/include/xen/io.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic helpers for doing MMIO accesses.
> + *
> + * Copyright (c) 2025 Cloud Software Group
> + */
> +#ifndef XEN_IO_H
> +#define XEN_IO_H
> +
> +#include 
> +
> +#include 
> +
> +static inline uint64_t read_mmio(const volatile void __iomem *mem,
> + unsigned int size)
> +{
> +switch ( size )
> +{
> +case 1:
> +return readb(mem);
> +
> +case 2:
> +return readw(mem);
> +
> +case 4:
> +return readl(mem);
> +
> +case 8:
> +return readq(mem);

... this and ...

> +}
> +
> +ASSERT_UNREACHABLE();
> +return ~0UL;
> +}
> +
> +static inline void write_mmio(volatile void __iomem *mem, uint64_t data,
> +  unsigned int size)
> +{
> +switch ( size )
> +{
> +case 1:
> +writeb(data, mem);
> +break;
> +
> +case 2:
> +writew(data, mem);
> +break;
> +
> +case 4:
> +writel(data, mem);
> +break;
> +
> +case 8:
> +writeq(data, mem);
> +break;

... this may (generally will) not work on 32-bit architectures. Add
CONFIG_64BIT conditionals? At which point return type / last parameter
type could move from uint64_t to unsigned long.

As to the top comment of the file: io.h is, to me at least, wider than
just dealing with MMIO accesses. IOW I fear that sentence may go stale
at some point, without anyone paying attention.

Jan

Re: [PATCH v1 08/14] xen/riscv: imsic_init() implementation

2025-04-14 Thread Jan Beulich

On 08.04.2025 17:57, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/imsic.c
> @@ -0,0 +1,286 @@
> +/* SPDX-License-Identifier: MIT */
> +
> +/*
> + * xen/arch/riscv/imsic.c
> + *
> + * RISC-V Incoming MSI Controller support
> + *
> + * (c) 2023 Microchip Technology Inc.
> + * (c) 2024 Vates

No 2025 here (if already the years matter)?

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +static struct imsic_config imsic_cfg;
> +
> +const struct imsic_config *imsic_get_config(void)

Does this need to return a pointer to non-const?

> +{
> +return &imsic_cfg;
> +}
> +
> +static int __init imsic_get_parent_hartid(struct dt_device_node *node,
> +  unsigned int index,
> +  unsigned long *hartid)
> +{
> +int res;
> +unsigned long hart;
> +struct dt_phandle_args args;
> +
> +/* Try the new-style interrupts-extended first */

The comment says "first", but then ...

> +res = dt_parse_phandle_with_args(node, "interrupts-extended",
> + "#interrupt-cells", index, &args);
> +if ( !res )
> +{
> +res = riscv_of_processor_hartid(args.np->parent, &hart);
> +if ( res < 0 )
> +return -EINVAL;
> +
> +*hartid = hart;
> +}
> +return res;
> +}

... nothing else is being tried.

Also, nit: Blank line please ahead of the main "return" of a function.

Further - any particular reason to discard riscv_of_processor_hartid()'s
error code on the error path?

> +
> +

Nit: No double blank lines please (and I wish I wouldn't need to repeat
this any further).

> +static int imsic_parse_node(struct dt_device_node *node,
> + unsigned int *nr_parent_irqs)
> +{
> +int rc;
> +unsigned int tmp;
> +paddr_t base_addr;
> +
> +/* Find number of parent interrupts */
> +*nr_parent_irqs = dt_number_of_irq(node);
> +if ( !*nr_parent_irqs )
> +{
> +printk(XENLOG_ERR "%s: no parent irqs available\n", node->name);
> +return -ENOENT;
> +}
> +
> +/* Find number of guest index bits in MSI address */
> +rc = dt_property_read_u32(node, "riscv,guest-index-bits",
> +  &imsic_cfg.guest_index_bits);
> +if ( !rc )

It is confusing to store a bool return value in a local "int" variable,
just to then use it as boolean. Is the local var needed at all here?

> +imsic_cfg.guest_index_bits = 0;
> +tmp = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT;
> +if ( tmp < imsic_cfg.guest_index_bits )
> +{
> +printk(XENLOG_ERR "%s: guest index bits too big\n", node->name);
> +return -ENOENT;
> +}
> +
> +/* Find number of HART index bits */
> +rc = dt_property_read_u32(node, "riscv,hart-index-bits",
> +  &imsic_cfg.hart_index_bits);
> +if ( !rc )
> +{
> +/* Assume default value */
> +imsic_cfg.hart_index_bits = fls(*nr_parent_irqs);
> +if ( BIT(imsic_cfg.hart_index_bits, UL) < *nr_parent_irqs )
> +imsic_cfg.hart_index_bits++;
> +}
> +tmp = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT -
> +  imsic_cfg.guest_index_bits;

tmp -= imsic_cfg.guest_index_bits;

? (And then similarly further down.)

> +if ( tmp < imsic_cfg.hart_index_bits )
> +{
> +printk(XENLOG_ERR "%s: HART index bits too big\n", node->name);
> +return -ENOENT;
> +}
> +
> +/* Find number of group index bits */
> +rc = dt_property_read_u32(node, "riscv,group-index-bits",
> +  &imsic_cfg.group_index_bits);
> +if ( !rc )
> +imsic_cfg.group_index_bits = 0;
> +tmp = BITS_PER_LONG - IMSIC_MMIO_PAGE_SHIFT -
> +  imsic_cfg.guest_index_bits - imsic_cfg.hart_index_bits;
> +if ( tmp < imsic_cfg.group_index_bits )
> +{
> +printk(XENLOG_ERR "%s: group index bits too big\n", node->name);
> +return -ENOENT;
> +}
> +
> +/* Find first bit position of group index */
> +tmp = IMSIC_MMIO_PAGE_SHIFT * 2;
> +rc = dt_property_read_u32(node, "riscv,group-index-shift",
> +  &imsic_cfg.group_index_shift);
> +if ( !rc )
> +imsic_cfg.group_index_shift = tmp;
> +if ( imsic_cfg.group_index_shift < tmp )
> +{
> +printk(XENLOG_ERR "%s: group index shift too small\n", node->name);
> +return -ENOENT;
> +}
> +tmp = imsic_cfg.group_index_bits + imsic_cfg.group_index_shift - 1;
> +if ( tmp >= BITS_PER_LONG )
> +{
> +printk(XENLOG_ERR "%s: group index shift too big\n", node->name);
> +return -EINVAL;
> +}
> +
> +/* Find number of interrupt identities */
> +rc = dt_property_read_u32(node, "riscv,num-ids", &imsic_cfg.nr_ids);
> +if ( !rc )
> +{
> +printk(XENLOG_ERR "%s: number of interrupt identities not found\n",
> +

[PATCH v4 01/15] xen/cpufreq: move "init" flag into common structure

2025-04-14 Thread Penny Zheng

AMD cpufreq cores will be intialized in two modes, legacy P-state mode,
and CPPC mode. So "init" flag shall be extracted from specific
"struct xen_processor_perf", and placed in the common
"struct processor_pminfo".

Signed-off-by: Penny Zheng 
---
v3 -> v4:
- new commit
---
 xen/drivers/acpi/pmstat.c | 4 ++--
 xen/drivers/cpufreq/cpufreq.c | 8 
 xen/include/acpi/cpufreq/processor_perf.h | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index c51b9ca358..767594908c 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -68,7 +68,7 @@ int do_get_pm_info(struct xen_sysctl_get_pmstat *op)
 return -ENODEV;
 if ( hwp_active() )
 return -EOPNOTSUPP;
-if ( !pmpt || !(pmpt->perf.init & XEN_PX_INIT) )
+if ( !pmpt || !(pmpt->init & XEN_PX_INIT) )
 return -EINVAL;
 break;
 default:
@@ -463,7 +463,7 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
 case CPUFREQ_PARA:
 if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
 return -ENODEV;
-if ( !pmpt || !(pmpt->perf.init & XEN_PX_INIT) )
+if ( !pmpt || !(pmpt->init & XEN_PX_INIT) )
 return -EINVAL;
 break;
 }
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 4a103c6de9..b01ed8e294 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -209,7 +209,7 @@ int cpufreq_add_cpu(unsigned int cpu)
 
 perf = &processor_pminfo[cpu]->perf;
 
-if ( !(perf->init & XEN_PX_INIT) )
+if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
 return -EINVAL;
 
 if (!cpufreq_driver.init)
@@ -367,7 +367,7 @@ int cpufreq_del_cpu(unsigned int cpu)
 
 perf = &processor_pminfo[cpu]->perf;
 
-if ( !(perf->init & XEN_PX_INIT) )
+if ( !(processor_pminfo[cpu]->init & XEN_PX_INIT) )
 return -EINVAL;
 
 if (!per_cpu(cpufreq_cpu_policy, cpu))
@@ -563,7 +563,7 @@ int set_px_pminfo(uint32_t acpi_id, struct 
xen_processor_performance *perf)
 if ( cpufreq_verbose )
 print_PPC(pxpt->platform_limit);
 
-if ( pxpt->init == XEN_PX_INIT )
+if ( pmpt->init == XEN_PX_INIT )
 {
 ret = cpufreq_limit_change(cpu);
 goto out;
@@ -572,7 +572,7 @@ int set_px_pminfo(uint32_t acpi_id, struct 
xen_processor_performance *perf)
 
 if ( perf->flags == ( XEN_PX_PCT | XEN_PX_PSS | XEN_PX_PSD | XEN_PX_PPC ) )
 {
-pxpt->init = XEN_PX_INIT;
+pmpt->init = XEN_PX_INIT;
 
 ret = cpufreq_cpu_init(cpu);
 goto out;
diff --git a/xen/include/acpi/cpufreq/processor_perf.h 
b/xen/include/acpi/cpufreq/processor_perf.h
index 301104e16f..5f2612b15a 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -29,14 +29,14 @@ struct processor_performance {
 struct xen_processor_px *states;
 struct xen_psd_package domain_info;
 uint32_t shared_type;
-
-uint32_t init;
 };
 
 struct processor_pminfo {
 uint32_t acpi_id;
 uint32_t id;
 struct processor_performanceperf;
+
+uint32_t init;
 };
 
 extern struct processor_pminfo *processor_pminfo[NR_CPUS];
-- 
2.34.1

1 2 >

1 - 100 of 155 matches

Mail list logo