Re: [PATCH v10 08/12] xen/page_alloc: introduce preserved page flags macro

2024-12-02 Thread Andrea Bastoni
Hi,

On 29/11/2024 12:09, Jan Beulich wrote:
> On 29.11.2024 10:32, Carlo Nonato wrote:
>> Hi Jan,
>>
>> On Thu, Nov 28, 2024 at 12:05 PM Jan Beulich  wrote:
>>>
>>> On 19.11.2024 15:13, Carlo Nonato wrote:
 PGC_static, PGC_extra and PGC_broken need to be preserved when assigning a
 page. Define a new macro that groups those flags and use it instead of 
 or'ing
 every time.

 To make preserved flags even more meaningful, they are kept also when
 switching state in mark_page_free().
 Enforce the removal of PGC_extra before freeing domain pages as this is
 considered an error and can cause ASSERT violations.

 Signed-off-by: Carlo Nonato 
 ---
 v10:
 - fixed commit message
 v9:
 - add PGC_broken to PGC_preserved
 - clear PGC_extra in alloc_domheap_pages() only if MEMF_no_refcount is set
 v8:
 - fixed PGC_extra ASSERT fail in alloc_domheap_pages() by removing 
 PGC_extra
   before freeing
 v7:
 - PGC_preserved used also in mark_page_free()
 v6:
 - preserved_flags renamed to PGC_preserved
 - PGC_preserved is used only in assign_pages()
 v5:
 - new patch
 ---
  xen/common/page_alloc.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

 diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
 index 7b911b5ed9..34cd473150 100644
 --- a/xen/common/page_alloc.c
 +++ b/xen/common/page_alloc.c
 @@ -160,6 +160,7 @@
  #endif

  #define PGC_no_buddy_merge PGC_static
 +#define PGC_preserved (PGC_extra | PGC_static | PGC_broken)

  #ifndef PGT_TYPE_INFO_INITIALIZER
  #define PGT_TYPE_INFO_INITIALIZER 0
 @@ -1427,12 +1428,11 @@ static bool mark_page_free(struct page_info *pg, 
 mfn_t mfn)
  {
  case PGC_state_inuse:
  BUG_ON(pg->count_info & PGC_broken);
 -pg->count_info = PGC_state_free;
 +pg->count_info = PGC_state_free | (pg->count_info & 
 PGC_preserved);
  break;
>>>
>>> PGC_extra doesn't want preserving here. Since it's mark_page_free(), and
>>> since PGC_extra is removed before freeing, the state change is apparently
>>> fine. But an assertion may want adding, for documentation purposes if
>>> nothing else.
>>>
>>> Alternatively it may make sense to indeed exclude PGC_extra here. In fact
>>> PGC_static doesn't need using here either, as unprepare_staticmem_pages()
>>> will explicitly set it again anyway.
>>>
>>> Hence I wonder whether the change here really is necessary (one will then
>>> be needed in the next patch aiui, when PGC_colored is introduced). Which
>>> would then eliminate the need for the final two hunks of the patch, I
>>> think.
>>>
  case PGC_state_offlining:
 -pg->count_info = (pg->count_info & PGC_broken) |
 - PGC_state_offlined;
 +pg->count_info = (pg->count_info & PGC_preserved) | 
 PGC_state_offlined;
  pg_offlined = true;
  break;
>>>
>>> I'm similarly unconvinced that anything other than PGC_broken (and
>>> subsequently perhaps PGC_colored) would need preserving here.
>>
>> Yes, we (re)checked the code and also believe that the introduction of
>> PGC_preserved is generating more confusion (and code) then the actual logical
>> help it provides.
>>
>> We'll remove this patch and integrate PGC_colored explicitly in the flags to
>> be preserved. This avoid the clumsy logic of preserving something (extra)
>> when it's not needed and then handling the special case to remove it
>> explicitly.
>> Basically my goal is to go back to v4 where this patch didn't exist.
> 
> Hmm, no, I don't think I said anything in the direction of removing 
> PGC_preserved
> again. I merely think you went too far in where it actually wants using.

Let me try to better detail the intention of what we have in mind.
Here again parts of this patch:

> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 7b911b5ed9..34cd473150 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -160,6 +160,7 @@
>  #endif
> 
>  #define PGC_no_buddy_merge PGC_static
> +#define PGC_preserved (PGC_extra | PGC_static | PGC_broken)

The granularity of preserved as implemented by v10 is different than what you 
require...

> 
>  #ifndef PGT_TYPE_INFO_INITIALIZER
>  #define PGT_TYPE_INFO_INITIALIZER 0
> @@ -1427,12 +1428,11 @@ static bool mark_page_free(struct page_info *pg, 
> mfn_t mfn)
>  {
>  case PGC_state_inuse:
>  BUG_ON(pg->count_info & PGC_broken);
> -pg->count_info = PGC_state_free;
> +pg->count_info = PGC_state_free | (pg->count_info & PGC_preserved);

here (no need),

>  break;
> 
>  case PGC_state_offlining:
> -pg->count_info = (pg->count_info & PGC_broken) |
> - PGC_state_offlined;
> +pg->count_info = (pg->count_info & PGC_preserved) | 
> PGC_state

[PATCH v4 2/5] ARM: pxa: Switch to new sys-off handler API

2024-12-02 Thread Andrew Davis
Kernel now supports chained power-off handlers. Use
register_platform_power_off() that registers a platform level power-off
handler. Legacy pm_power_off() will be removed once all drivers and archs
are converted to the new sys-off API.

Signed-off-by: Andrew Davis 
---
 arch/arm/mach-pxa/spitz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-pxa/spitz.c b/arch/arm/mach-pxa/spitz.c
index 33533e35720f8..c0b1f7e6be874 100644
--- a/arch/arm/mach-pxa/spitz.c
+++ b/arch/arm/mach-pxa/spitz.c
@@ -1096,7 +1096,7 @@ static void __init spitz_init(void)
software_node_register(&spitz_scoop_2_gpiochip_node);
 
init_gpio_reset(SPITZ_GPIO_ON_RESET, 1, 0);
-   pm_power_off = spitz_poweroff;
+   register_platform_power_off(spitz_poweroff);
 
PMCR = 0x00;
 
-- 
2.39.2




[PATCH v4 5/5] arm/xen: Switch to new sys-off handler API

2024-12-02 Thread Andrew Davis
Kernel now supports chained power-off handlers. Use
register_platform_power_off() that registers a platform level power-off
handler. Legacy pm_power_off() will be removed once all drivers and archs
are converted to the new sys-off API.

Signed-off-by: Andrew Davis 
---
 arch/arm/xen/enlighten.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index a395b6c0aae2a..8655bc3d36347 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -541,7 +541,7 @@ static int __init xen_late_init(void)
if (!xen_domain())
return -ENODEV;
 
-   pm_power_off = xen_power_off;
+   register_platform_power_off(xen_power_off);
register_restart_handler(&xen_restart_nb);
if (!xen_initial_domain()) {
struct timespec64 ts;
-- 
2.39.2




[PATCH v4 3/5] ARM: sa1100: Switch to new sys-off handler API

2024-12-02 Thread Andrew Davis
Kernel now supports chained power-off handlers. Use
register_platform_power_off() that registers a platform level power-off
handler. Legacy pm_power_off() will be removed once all drivers and archs
are converted to the new sys-off API.

Signed-off-by: Andrew Davis 
---
 arch/arm/mach-sa1100/generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-sa1100/generic.c b/arch/arm/mach-sa1100/generic.c
index 0c586047d130f..5383a26f51169 100644
--- a/arch/arm/mach-sa1100/generic.c
+++ b/arch/arm/mach-sa1100/generic.c
@@ -298,7 +298,7 @@ static struct platform_device *sa11x0_devices[] __initdata 
= {
 static int __init sa1100_init(void)
 {
struct resource wdt_res = DEFINE_RES_MEM(0x9000, 0x20);
-   pm_power_off = sa1100_power_off;
+   register_platform_power_off(sa1100_power_off);
 
regulator_has_full_constraints();
 
-- 
2.39.2




[PATCH v4 0/5] Switch more ARM plats to sys-off handler API

2024-12-02 Thread Andrew Davis
Hello all,

Continuing the quest to remove the legacy pm_power_off() global
function handler. Remove uses from arch/arm/ using the helper
register_platform_power_off().

Thanks,
Andrew

Changes for v4:
 - Drop already taken patches
 - Rebase on latest master

Changes for v3:
 - Rebase on v6.12-rc1

Changes for v2:
 - Collect Reviewed/Acked-bys
 - Rebase on v6.11-rc1

Andrew Davis (5):
  ARM: highbank: Switch to new sys-off handler API
  ARM: pxa: Switch to new sys-off handler API
  ARM: sa1100: Switch to new sys-off handler API
  ARM: vt8500: Switch to new sys-off handler API
  arm/xen: Switch to new sys-off handler API

 arch/arm/mach-highbank/highbank.c | 2 +-
 arch/arm/mach-pxa/spitz.c | 2 +-
 arch/arm/mach-sa1100/generic.c| 2 +-
 arch/arm/mach-vt8500/vt8500.c | 2 +-
 arch/arm/xen/enlighten.c  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.39.2




[PATCH v4 4/5] ARM: vt8500: Switch to new sys-off handler API

2024-12-02 Thread Andrew Davis
Kernel now supports chained power-off handlers. Use
register_platform_power_off() that registers a platform level power-off
handler. Legacy pm_power_off() will be removed once all drivers and archs
are converted to the new sys-off API.

Signed-off-by: Andrew Davis 
---
 arch/arm/mach-vt8500/vt8500.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-vt8500/vt8500.c b/arch/arm/mach-vt8500/vt8500.c
index 0ab40087ae1cc..1d294255d7083 100644
--- a/arch/arm/mach-vt8500/vt8500.c
+++ b/arch/arm/mach-vt8500/vt8500.c
@@ -141,7 +141,7 @@ static void __init vt8500_init(void)
pr_err("%s:ioremap(power_off) failed\n", __func__);
}
if (pmc_base)
-   pm_power_off = &vt8500_power_off;
+   register_platform_power_off(vt8500_power_off);
else
pr_err("%s: PMC Hibernation register could not be remapped, not 
enabling power off!\n", __func__);
 }
-- 
2.39.2




[PATCH v4 1/5] ARM: highbank: Switch to new sys-off handler API

2024-12-02 Thread Andrew Davis
Kernel now supports chained power-off handlers. Use
register_platform_power_off() that registers a platform level power-off
handler. Legacy pm_power_off() will be removed once all drivers and archs
are converted to the new sys-off API.

Signed-off-by: Andrew Davis 
Reviewed-by: Andre Przywara 
---
 arch/arm/mach-highbank/highbank.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-highbank/highbank.c 
b/arch/arm/mach-highbank/highbank.c
index 5d4f977ac7d2a..47335c7dadf8d 100644
--- a/arch/arm/mach-highbank/highbank.c
+++ b/arch/arm/mach-highbank/highbank.c
@@ -143,7 +143,7 @@ static void __init highbank_init(void)
sregs_base = of_iomap(np, 0);
WARN_ON(!sregs_base);
 
-   pm_power_off = highbank_power_off;
+   register_platform_power_off(highbank_power_off);
highbank_pm_init();
 
bus_register_notifier(&platform_bus_type, &highbank_platform_nb);
-- 
2.39.2




Re: [PATCH v7 10/10] tools/x86: Synthesise domain topologies

2024-12-02 Thread Jan Beulich
On 21.10.2024 17:46, Alejandro Vallejo wrote:
> Expose sensible topologies in leaf 0xb. At the moment it synthesises
> non-HT systems, in line with the previous code intent.
> 
> Leaf 0xb in the host policy is no longer zapped and the guest {max,def}
> policies have their topology leaves zapped instead. The intent is for
> toolstack to populate them. There's no current use for the topology
> information in the host policy, but it makes no harm.

How does this (and hence ...

> @@ -619,6 +616,9 @@ static void __init calculate_pv_max_policy(void)
>  recalculate_xstate(p);
>  
>  p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
> +
> +/* Wipe host topology. Populated by toolstack */
> +memset(p->topo.raw, 0, sizeof(p->topo.raw));
>  }
>  
>  static void __init calculate_pv_def_policy(void)
> @@ -785,6 +785,9 @@ static void __init calculate_hvm_max_policy(void)
>  
>  /* It's always possible to emulate CPUID faulting for HVM guests */
>  p->platform_info.cpuid_faulting = true;
> +
> +/* Wipe host topology. Populated by toolstack */
> +memset(p->topo.raw, 0, sizeof(p->topo.raw));
>  }

... these, at least comment-wise) fit with Dom0 also needing some data
there?

Also nit: Multi-sentence comments want full stops after every sentence.

Jan



Re: [PATCH] xen: pcpu: remove unnecessary __ref annotation

2024-12-02 Thread Juergen Gross

On 02.12.24 09:59, Sergio Miguéns Iglesias wrote:

The __ref annotation has been there since the beginning of time, but no
calls to __init functions exist inside it, and the compilation of the
Xen driver does not output any warnings when removed.


The __ref annotation was added with commit 6c847402e1c6, but I think the
need for it went away some time later.



Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Oleksandr Tyshchenko 
Cc: xen-devel@lists.xenproject.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sergio Miguéns Iglesias 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Feature freeze date for Xen 4.20 is Fri Dec 20, 2024

2024-12-02 Thread oleksii . kurochko
Hello everyone,

The feature freeze date for Xen 4.20 is December 20, 2024.

Patches adding new features must be committed by this date.
Straightforward bug fixes may continue to be accepted by maintainers
beyond this point.

If you would like your features included in this release, please ensure
they are committed by the deadline.

Thank you in advance, and have a great week!

Best regards,
  Oleksii





Re: [PATCH v2 1/4] common: remove -fno-stack-protector from EMBEDDED_EXTRA_CFLAGS

2024-12-02 Thread Andrew Cooper
On 02/12/2024 8:06 am, Jan Beulich wrote:
> On 30.11.2024 02:10, Volodymyr Babchuk wrote:
>> This patch is preparation for making stack protector
>> configurable. First step is to remove -fno-stack-protector flag from
>> EMBEDDED_EXTRA_CFLAGS so separate projects (Hypervisor in this case)
>> can enable/disable this feature by themselves.
> s/projects/components/ ?
>
>> --- a/stubdom/Makefile
>> +++ b/stubdom/Makefile
>> @@ -54,6 +54,8 @@ TARGET_CFLAGS += $(CFLAGS)
>>  TARGET_CPPFLAGS += $(CPPFLAGS)
>>  $(call cc-options-add,TARGET_CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>> +$(call cc-option-add,TARGET_CFLAGS,CC,-fno-stack-protector)
>> +
>>  # Do not use host headers and libs
>>  GCC_INSTALL = $(shell LANG=C gcc -print-search-dirs | sed -n -e 's/install: 
>> \(.*\)/\1/p')
>>  TARGET_CPPFLAGS += -U __linux__ -U __FreeBSD__ -U __sun__
>> --- a/tools/firmware/Rules.mk
>> +++ b/tools/firmware/Rules.mk
>> @@ -15,6 +15,8 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>>  $(call cc-option-add,CFLAGS,CC,-fcf-protection=none)
>>  
>> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)
>> +
>>  # Do not add the .note.gnu.property section to any of the firmware objects: 
>> it
>>  # breaks the rombios binary and is not useful for firmware anyway.
>>  $(call cc-option-add,CFLAGS,CC,-Wa$$(comma)-mx86-used-note=no)
>> --- a/tools/tests/x86_emulator/testcase.mk
>> +++ b/tools/tests/x86_emulator/testcase.mk
>> @@ -4,6 +4,8 @@ include $(XEN_ROOT)/tools/Rules.mk
>>  
>>  $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)
> Is use of cc-option-add really necessary throughout here, when ...
>
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -432,6 +432,8 @@ else
>>  CFLAGS_UBSAN :=
>>  endif
>>  
>> +CFLAGS += -fno-stack-protector
> ... is isn't needed here? Iirc the compiler version ranges supported don't
> vary between components. Then again afaics $(EMBEDDED_EXTRA_CFLAGS) is used
> by x86 only right now, and with cc-options-add, so perhaps it (a) needs
> using cc-options-add here, too, and (b) it wants explaining why this needs
> generalizing from x86 to all architectures. Quite possibly hypervisor use
> of $(EMBEDDED_EXTRA_CFLAGS) may want generalizing separately, up front?

EMBEDDED_EXTRA_CFLAGS uses cc-*-add because some options are (/were) not
accepted by compilers.  Notably -fno-stack-protector-all (found from v1
of this series), and prior to that, -no-pie which as I recall is an LD
option not a CC option.

All supported compilers know -fno-stack-protector (found when checking
-fno-stack-protector-all) so it can be added to plain CFLAGS everywhere,
not only in xen/

~Andrew



Re: [PATCH 08/15] x86/hyperlaunch: locate dom0 kernel with hyperlaunch

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> Look for a subnode of type `multiboot,kernel` within a domain node. If found,
> process the reg property for the MB1 module index. If the bootargs property is
> present and there was not an MB1 string, then use the command line from the
> device tree definition.

Why specifically MB1?

> --- a/xen/arch/x86/domain_builder/core.c
> +++ b/xen/arch/x86/domain_builder/core.c
> @@ -56,6 +56,18 @@ void __init builder_init(struct boot_info *bi)
>  
>  printk(XENLOG_INFO "  Number of domains: %d\n", bi->nr_domains);
>  }
> +else
> +{
> +int i;

Plain int when ...

> +/* Find first unknown boot module to use as Dom0 kernel */
> +printk("Falling back to using first boot module as dom0\n");
> +i = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
> +bi->mods[i].type = BOOTMOD_KERNEL;
> +bi->domains[0].kernel = &bi->mods[i];
> +bi->nr_domains = 1;
> +}

... it's used as array index (and there's no check for the function return
value being negative)?

> --- a/xen/arch/x86/domain_builder/fdt.c
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -14,6 +14,122 @@
>  
>  #include "fdt.h"
>  
> +static inline int __init fdt_get_prop_as_reg(

What does "reg" stand for here?

> +const void *fdt, int node, const char *name, unsigned int ssize,
> +unsigned int asize, uint64_t *size, uint64_t *addr)
> +{
> +int ret;
> +const struct fdt_property *prop;
> +fdt32_t *cell;
> +
> +/* FDT spec max size is 4 (128bit int), but largest arch int size is 64 
> */
> +if ( ssize > 2 || asize > 2 )
> +return -EINVAL;
> +
> +prop = fdt_get_property(fdt, node, name, &ret);
> +if ( !prop || ret < sizeof(u32) )
> +return ret < 0 ? ret : -EINVAL;
> +
> +/* read address field */
> +cell = (fdt32_t *)prop->data;
> +
> +if ( asize == 1 )
> +{
> +uint32_t val;
> +fdt_cell_as_u32(cell, &val);
> +*addr = (uint64_t)val;

No need for a cast here nor ...

> +}
> +else
> +fdt_cell_as_u64(cell, addr);
> +
> +/* read size field */
> +cell += asize;
> +
> +if ( ssize == 1 )
> +{
> +uint32_t val;
> +fdt_cell_as_u32(cell, &val);
> +*size = (uint64_t)val;

... here?

> +}
> +else
> +fdt_cell_as_u64(cell, size);
> +
> +return 0;
> +}

This whole function reads very much like a library one. Does it really need
adding here, rather than to the FDT library code we already have? In any
event there's nothing x86-specific about it, afaics.

> +static int __init dom0less_module_node(
> +void *fdt, int node, int size_size, int address_size)

Three times plain int, when ...

> +{
> +uint64_t size, addr;
> +int ret = fdt_get_prop_as_reg(fdt, node, "reg", size_size, address_size,

... two get converted to unsigned int in the course of the function call
here?

> +  &size, &addr);
> +/* An FDT error value may have been returned, translate to -EINVAL */
> +if ( ret < 0 )
> +return -EINVAL;
> +
> +if ( size != 0 )
> +return -EOPNOTSUPP;

Not knowing much about DT: What does 0 represent here?

> +if ( addr > MAX_NR_BOOTMODS )
> +return -ERANGE;
> +
> +/*
> + * MAX_NR_BOOTMODS cannot exceed the max for MB1, represented by a u32,
> + * thus the cast down to a u32 will be safe due to the prior check.
> + */
> +return (int)addr;

Comment and cast contradict one another. DYM u32 (really: uint32_t), or plain
int? If you mean to return a plain int (for the sake of the -errno values
further up), MAX_NR_BOOTMODS needs to stay below 2**31.

> +static int __init process_domain_node(
> +struct boot_info *bi, void *fdt, int dom_node)
> +{
> +int node;
> +struct boot_domain *bd = &bi->domains[bi->nr_domains];
> +const char *name = fdt_get_name(fdt, dom_node, NULL);
> +int address_size = fdt_address_cells(fdt, dom_node);
> +int size_size = fdt_size_cells(fdt, dom_node);
> +
> +if ( address_size < 0 || size_size < 0 )
> +{
> +printk("  failed processing #address or #size for domain %s)\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +
> +fdt_for_each_subnode(node, fdt, dom_node)
> +{
> +if ( fdt_node_check_compatible(fdt, node, "multiboot,kernel") == 0 )
> +{
> +int idx = dom0less_module_node(fdt, node, size_size, 
> address_size);
> +if ( idx < 0 )
> +{
> +printk("  failed processing kernel module for domain %s)\n",
> +   name == NULL ? "unknown" : name);
> +return idx;
> +}
> +
> +if ( idx > bi->nr_modules )
> +{
> +printk("  invalid kernel module index for domain node 
> (%d)\n",
> +   bi->nr_domains);
> +return -EINVAL;
> +

Re: [PATCH v2 1/2] xl: Keep monitoring suspended domain

2024-12-02 Thread Jan Beulich
On 28.11.2024 17:45, Anthony PERARD wrote:
> On Tue, Nov 26, 2024 at 12:19:40PM -0500, Jason Andryuk wrote:
>> When a VM transitioned to LIBXL_SHUTDOWN_REASON_SUSPEND, the xl daemon
>> was exiting as 0 = DOMAIN_RESTART_NONE "No domain restart".
>> Later, when the VM actually shutdown, the missing xl daemon meant the
>> domain wasn't cleaned up properly.
>>
>> Add a new DOMAIN_RESTART_SUSPENDED to handle the case.  The xl daemon
>> keeps running to react to future shutdown events.
>>
>> The domain death event needs to be re-enabled to catch subsequent
>> events.  The libxl_evgen_domain_death is moved from death_list to
>> death_reported, and then it isn't found on subsequent iterations through
>> death_list.  We enable the new event before disabling the old event, to
>> keep the xenstore watch active.  If it is unregistered and
>> re-registered, it'll fire immediately for our suspended domain which
>> will end up continuously re-triggering.
>>
>> Signed-off-by: Jason Andryuk 
> 
> Reviewed-by: Anthony PERARD 

While committing I was wondering: Does this want/need backporting (and hence
was it perhaps lacking a Fixes: tag)?

Jan



Re: [PATCH v2 4/4] xen: riscv: enable stack protector feature

2024-12-02 Thread oleksii . kurochko
On Mon, 2024-12-02 at 09:12 +0100, Jan Beulich wrote:
> On 30.11.2024 02:10, Volodymyr Babchuk wrote:
> > Enable previously added CONFIG_STACK_PROTECTOR feature for RISC-V
> > platform. Here we can call boot_stack_chk_guard_setup() in
> > start_xen()
> > function, because it never returns, so stack protector code will
> > not
> > be triggered because of changed canary.
> > 
> > Signed-off-by: Volodymyr Babchuk 
> > Tested-by: Oleksii Kurochko 
> 
> Isn't this premature? For ...
> 
> > @@ -57,6 +58,8 @@ void __init noreturn start_xen(unsigned long
> > bootcpu_id,
> >  if ( !boot_fdt_info(device_tree_flattened, dtb_addr) )
> >  BUG();
> >  
> > +    boot_stack_chk_guard_setup();
> 
> ... this function's use of get_random(), either arch_get_random()
> needs
> to be implemented, or (as Julien also pointed out for Arm64) NOW()
> needs
> to work. Yet get_s_time() presently expands to just BUG_ON(). Given
> this
> it's not even clear to me how Oleksii managed to actually test this.
I will double check that but it worked for me ( I didn't face BUG_ON()
).

~ Oleksii



Re: [PATCH v7 03/10] xen/x86: Add supporting code for uploading LAPIC contexts during domain create

2024-12-02 Thread Jan Beulich
On 21.10.2024 17:45, Alejandro Vallejo wrote:
> A later patch will upload LAPIC contexts as part of domain creation. In
> order for it not to encounter a problem where the architectural state
> does not reflect the APIC ID in the hidden state, this patch ensures
> updates to the hidden state trigger an update in the architectural
> registers so the APIC ID in both is consistent.
> 
> Signed-off-by: Alejandro Vallejo 

Reviewed-by: Jan Beulich 





Re: [PATCH v2 0/4] Move some boot code from assembly to C

2024-12-02 Thread Frediano Ziglio
ping

On Fri, Nov 22, 2024 at 9:34 AM Frediano Ziglio
 wrote:
>
> As a continuation of this series start sorting out the issue of headers
> not compatible with 32 bit.
> Instead of having to change headers which are almost only used for 64 bit
> allows to override headers or move reusable definitions to new shared
> headers.
> This results in less changes.
>
> Changes since v1:
> - rebased (with conflicts).
>
> Frediano Ziglio (4):
>   Use an include/boot directory to override headers for boot code
>   x86/boot: Use header to allows inclusion of public xen.h header
>   x86/boot: Move some settings to C
>   x86/boot: Use external symbols from cmdline_parse_early
>
>  xen/arch/x86/boot/Makefile|  2 +-
>  xen/arch/x86/boot/build32.lds.S   |  4 
>  xen/arch/x86/boot/cmdline.c   | 14 ++--
>  xen/arch/x86/boot/head.S  | 19 +--
>  xen/arch/x86/boot/reloc.c | 28 ++-
>  xen/arch/x86/boot/trampoline.S|  2 +-
>  xen/arch/x86/include/asm/guest/pvh-boot.h |  1 +
>  xen/arch/x86/include/asm/setup.h  |  2 ++
>  xen/arch/x86/include/boot/public/xen.h| 28 +++
>  xen/arch/x86/include/boot/xen/cpumask.h   |  1 +
>  xen/arch/x86/include/boot/xen/string.h| 10 
>  11 files changed, 83 insertions(+), 28 deletions(-)
>  create mode 100644 xen/arch/x86/include/boot/public/xen.h
>  create mode 100644 xen/arch/x86/include/boot/xen/cpumask.h
>  create mode 100644 xen/arch/x86/include/boot/xen/string.h
>



Re: [PATCH v2 1/2] xen/swiotlb: add alignment check for dma buffers

2024-12-02 Thread Jürgen Groß

On 29.11.24 18:36, Thierry Escande wrote:

Hi,

On 16/09/2024 08:47, Juergen Gross wrote:

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 35155258a7e2..ddf5b1df632e 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -78,9 +78,15 @@ static inline int range_straddles_page_boundary(phys_addr_t 
p, size_t size)
  {
unsigned long next_bfn, xen_pfn = XEN_PFN_DOWN(p);
unsigned int i, nr_pages = XEN_PFN_UP(xen_offset_in_page(p) + size);
+   phys_addr_t algn = 1ULL << (get_order(size) + PAGE_SHIFT);
  
  	next_bfn = pfn_to_bfn(xen_pfn);
  
+	/* If buffer is physically aligned, ensure DMA alignment. */

+   if (IS_ALIGNED(p, algn) &&
+   !IS_ALIGNED(next_bfn << XEN_PAGE_SHIFT, algn))
+   return 1;


There is a regression in the mpt3sas driver because of this change.
When, in a dom0, this driver creates its DMA pool at probe time and
calls dma_pool_zalloc() (see [1]), the call to
range_straddles_page_boundary() (from xen_swiotlb_alloc_coherent())
returns 1 because of the alignment checks added by this patch. Then the
call to xen_create_contiguous_region() in xen_swiotlb_alloc_coherent()
fails because the passed size order is too big (> MAX_CONTIG_ORDER).
This driver sets the pool allocation block size to 2.3+ MBytes.

 From previous discussions on the v1 patch, these checks are not
necessary from xen_swiotlb_alloc_coherent() that already ensures
alignment, right?


It ensures alignment regarding guest physical memory, but it doesn't do
so for machine memory.

For DMA machine memory proper alignment might be needed, too, so the
check is required. As an example, some crypto drivers seem to rely on
proper machine memory alignment, which was the reason for those checks
to be added.

Possible solutions include:

- rising the MAX_CONTIG_ORDER limit (to which value?)
- adding a way to allocate large DMA buffers with relaxed alignment
  requirements (this will impact the whole DMA infrastructure plus
  drivers like mp3sas which would need to use the new interface)
- modify the mpt3sas driver to stay within the current limits
- only guarantee proper machine memory alignment up to MAX_CONTIG_ORDER
  (risking to introduce hard to diagnose bugs for the rare use cases of
  such large buffers)


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v10 11/12] xen/arm: make consider_modules() available for xen relocation

2024-12-02 Thread Michal Orzel



On 19/11/2024 15:13, Carlo Nonato wrote:
> 
> 
> Cache coloring must physically relocate Xen in order to color the hypervisor
> and consider_modules() is a key function that is needed to find a new
> available physical address.
> 
> 672d67f339c0 ("xen/arm: Split MMU-specific setup_mm() and related code out")
> moved consider_modules() under arm32. Move it to mmu/setup.c and make it
> non-static so that it can be used outside.
> 
> Signed-off-by: Carlo Nonato 
> ---
> v10:
> - no changes
> v9:
> - no changes
> v8:
> - patch adapted to new changes to consider_modules()
> v7:
> - moved consider_modules() to arm/mmu/setup.c
> v6:
> - new patch
> ---
>  xen/arch/arm/arm32/mmu/mm.c  | 95 +--
>  xen/arch/arm/include/asm/setup.h |  3 +
>  xen/arch/arm/mmu/setup.c | 97 
>  3 files changed, 101 insertions(+), 94 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
> index 063611412b..c5fcd19291 100644
> --- a/xen/arch/arm/arm32/mmu/mm.c
> +++ b/xen/arch/arm/arm32/mmu/mm.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
Sort alphabetically.

> 
>  static unsigned long opt_xenheap_megabytes __initdata;
>  integer_param("xenheap_megabytes", opt_xenheap_megabytes);
> @@ -31,100 +32,6 @@ static void __init setup_directmap_mappings(unsigned long 
> base_mfn,
>  directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
>  }
> 
> -/*
> - * Returns the end address of the highest region in the range s..e
> - * with required size and alignment that does not conflict with the
> - * modules from first_mod to nr_modules.
> - *
> - * For non-recursive callers first_mod should normally be 0 (all
> - * modules and Xen itself) or 1 (all modules but not Xen).
> - */
> -static paddr_t __init consider_modules(paddr_t s, paddr_t e,
> -   uint32_t size, paddr_t align,
> -   int first_mod)
> -{
> -const struct membanks *reserved_mem = bootinfo_get_reserved_mem();
> -#ifdef CONFIG_STATIC_SHM
> -const struct membanks *shmem = bootinfo_get_shmem();
> -#endif
> -const struct bootmodules *mi = &bootinfo.modules;
> -int i;
> -int nr;
> -
> -s = (s+align-1) & ~(align-1);
> -e = e & ~(align-1);
> -
> -if ( s > e ||  e - s < size )
> -return 0;
> -
> -/* First check the boot modules */
> -for ( i = first_mod; i < mi->nr_mods; i++ )
> -{
> -paddr_t mod_s = mi->module[i].start;
> -paddr_t mod_e = mod_s + mi->module[i].size;
> -
> -if ( s < mod_e && mod_s < e )
> -{
> -mod_e = consider_modules(mod_e, e, size, align, i+1);
> -if ( mod_e )
> -return mod_e;
> -
> -return consider_modules(s, mod_s, size, align, i+1);
> -}
> -}
> -
> -/*
> - * i is the current bootmodule we are evaluating, across all
> - * possible kinds of bootmodules.
> - *
> - * When retrieving the corresponding reserved-memory addresses, we
> - * need to index the reserved_mem bank starting from 0, and only counting
> - * the reserved-memory modules. Hence, we need to use i - nr.
> - */
> -nr = mi->nr_mods;
> -for ( ; i - nr < reserved_mem->nr_banks; i++ )
> -{
> -paddr_t r_s = reserved_mem->bank[i - nr].start;
> -paddr_t r_e = r_s + reserved_mem->bank[i - nr].size;
> -
> -if ( s < r_e && r_s < e )
> -{
> -r_e = consider_modules(r_e, e, size, align, i + 1);
> -if ( r_e )
> -return r_e;
> -
> -return consider_modules(s, r_s, size, align, i + 1);
> -}
> -}
> -
> -#ifdef CONFIG_STATIC_SHM
> -nr += reserved_mem->nr_banks;
> -for ( ; i - nr < shmem->nr_banks; i++ )
> -{
> -paddr_t r_s, r_e;
> -
> -r_s = shmem->bank[i - nr].start;
> -
> -/* Shared memory banks can contain INVALID_PADDR as start */
> -if ( INVALID_PADDR == r_s )
> -continue;
> -
> -r_e = r_s + shmem->bank[i - nr].size;
> -
> -if ( s < r_e && r_s < e )
> -{
> -r_e = consider_modules(r_e, e, size, align, i + 1);
> -if ( r_e )
> -return r_e;
> -
> -return consider_modules(s, r_s, size, align, i + 1);
> -}
> -}
> -#endif
> -
> -return e;
> -}
> -
>  /*
>   * Find a contiguous region that fits in the static heap region with
>   * required size and alignment, and return the end address of the region
> diff --git a/xen/arch/arm/include/asm/setup.h 
> b/xen/arch/arm/include/asm/setup.h
> index 64c227d171..0c560d141f 100644
> --- a/xen/arch/arm/include/asm/setup.h
> +++ b/xen/arch/arm/include/asm/setup.h
> @@ -89,6 +89,9 @@ struct init_info
>  unsigned int cpuid;
>  };
> 
> +paddr_t consider_modules(paddr_t s, paddr_t e, uint32_t size, paddr_t align,
> + int first_mod);

Re: [PATCH 06/15] x86/hyperlaunch: introduce the domain builder

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> Introduce the domain builder which is capable of consuming a device tree as 
> the
> first boot module. If it finds a device tree as the first boot module, it will
> set its type to BOOTMOD_FDT. This change only detects the boot module and
> continues to boot with slight change to the boot convention that the dom0
> kernel is no longer first boot module but is the second.
> 
> No functional change intended.
> 
> Signed-off-by: Daniel P. Smith 
> ---
>  xen/arch/x86/Makefile|  2 +
>  xen/arch/x86/domain_builder/Makefile |  3 ++
>  xen/arch/x86/domain_builder/core.c   | 55 
>  xen/arch/x86/domain_builder/fdt.c| 38 
>  xen/arch/x86/domain_builder/fdt.h| 21 +
>  xen/arch/x86/include/asm/bootinfo.h  |  3 ++
>  xen/arch/x86/include/asm/domainbuilder.h |  8 
>  xen/arch/x86/setup.c | 18 +---
>  8 files changed, 142 insertions(+), 6 deletions(-)
>  create mode 100644 xen/arch/x86/domain_builder/Makefile
>  create mode 100644 xen/arch/x86/domain_builder/core.c
>  create mode 100644 xen/arch/x86/domain_builder/fdt.c
>  create mode 100644 xen/arch/x86/domain_builder/fdt.h

As I'm sure I indicated before: Dashes instead of underscores please in new
files' names.

>  create mode 100644 xen/arch/x86/include/asm/domainbuilder.h

Why is there no separator in this file's name?

Similar question as on an earlier patch: Why is all of this x86-specific, when
a goal was generalization?

> --- /dev/null
> +++ b/xen/arch/x86/domain_builder/core.c
> @@ -0,0 +1,55 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024, Apertus Solutions, LLC
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "fdt.h"
> +
> +void __init builder_init(struct boot_info *bi)
> +{
> +if ( IS_ENABLED(CONFIG_DOMAIN_BUILDER) )
> +{
> +int ret;
> +
> +switch ( ret = has_hyperlaunch_fdt(bi) )
> +{
> +case 0:
> +printk("Hyperlaunch device tree detected\n");
> +bi->hyperlaunch_enabled = true;
> +bi->mods[0].type = BOOTMOD_FDT;
> +break;
> +case -EINVAL:
> +printk("Hyperlaunch device tree was not detected\n");
> +bi->hyperlaunch_enabled = false;
> +break;
> +case -ENOENT:
> +fallthrough;

No need for this.

> +case -ENODATA:
> +printk("Device tree found, but not hyperlaunch (%d)\n", ret);
> +bi->hyperlaunch_enabled = false;
> +bi->mods[0].type = BOOTMOD_FDT;
> +break;
> +default:
> +printk("Unknown error (%d) occured checking for hyperlaunch 
> device tree\n",
> +   ret);
> +bi->hyperlaunch_enabled = false;
> +}

Nit: Misra demands "break" at the end of default as well.

Blank lines between non-fallthrough blocks would also be nice.

> +

Nit: Excess blank line.

> --- /dev/null
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024, Apertus Solutions, LLC
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include  /* required for asm/setup.h */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "fdt.h"
> +
> +int __init has_hyperlaunch_fdt(struct boot_info *bi)
> +{
> +int ret = 0;
> +void *fdt = bootstrap_map_bm(&bi->mods[HYPERLAUNCH_MODULE_IDX]);

const void *?

> @@ -1277,9 +1278,12 @@ void asmlinkage __init noreturn __start_xen(void)
> bi->nr_modules);
>  }
>  
> -/* Dom0 kernel is always first */
> -bi->mods[0].type = BOOTMOD_KERNEL;
> -bi->domains[0].kernel = &bi->mods[0];
> +builder_init(bi);
> +
> +/* Find first unknown boot module to use as Dom0 kernel */
> +i = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
> +bi->mods[i].type = BOOTMOD_KERNEL;
> +bi->domains[0].kernel = &bi->mods[i];

Better latch the result here into a separate local variable, for use ...

> @@ -1466,8 +1470,9 @@ void asmlinkage __init noreturn __start_xen(void)
>  xen->size  = __2M_rwdata_end - _stext;
>  }
>  
> -bi->mods[0].headroom =
> -bzimage_headroom(bootstrap_map_bm(&bi->mods[0]), bi->mods[0].size);
> +i = first_boot_module_index(bi, BOOTMOD_KERNEL);
> +bi->mods[i].headroom =
> +bzimage_headroom(bootstrap_map_bm(&bi->mods[i]), bi->mods[i].size);
>  bootstrap_unmap();
>  
>  #ifndef highmem_start
> @@ -1591,7 +1596,8 @@ void asmlinkage __init noreturn __start_xen(void)
>  #endif
>  }
>  
> -if ( bi->mods[0].headroom && !bi->mods[0].relocated )
> +i = first_boot_module_index(bi, BOOTMOD_KERNEL);
> +if ( bi->mods[i].headroom && !bi->mods[0].relocated )
>  panic("Not enough memory to relocate the dom0 kernel image\n");
>  for ( i = 0; i < bi->nr_modules; ++

Re: [PATCH v2 1/4] common: remove -fno-stack-protector from EMBEDDED_EXTRA_CFLAGS

2024-12-02 Thread Jan Beulich
On 02.12.2024 09:06, Jan Beulich wrote:
> On 30.11.2024 02:10, Volodymyr Babchuk wrote:
>> This patch is preparation for making stack protector
>> configurable. First step is to remove -fno-stack-protector flag from
>> EMBEDDED_EXTRA_CFLAGS so separate projects (Hypervisor in this case)
>> can enable/disable this feature by themselves.
> 
> s/projects/components/ ?
> 
>> --- a/stubdom/Makefile
>> +++ b/stubdom/Makefile
>> @@ -54,6 +54,8 @@ TARGET_CFLAGS += $(CFLAGS)
>>  TARGET_CPPFLAGS += $(CPPFLAGS)
>>  $(call cc-options-add,TARGET_CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>> +$(call cc-option-add,TARGET_CFLAGS,CC,-fno-stack-protector)
>> +
>>  # Do not use host headers and libs
>>  GCC_INSTALL = $(shell LANG=C gcc -print-search-dirs | sed -n -e 's/install: 
>> \(.*\)/\1/p')
>>  TARGET_CPPFLAGS += -U __linux__ -U __FreeBSD__ -U __sun__
>> --- a/tools/firmware/Rules.mk
>> +++ b/tools/firmware/Rules.mk
>> @@ -15,6 +15,8 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>>  $(call cc-option-add,CFLAGS,CC,-fcf-protection=none)
>>  
>> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)
>> +
>>  # Do not add the .note.gnu.property section to any of the firmware objects: 
>> it
>>  # breaks the rombios binary and is not useful for firmware anyway.
>>  $(call cc-option-add,CFLAGS,CC,-Wa$$(comma)-mx86-used-note=no)
>> --- a/tools/tests/x86_emulator/testcase.mk
>> +++ b/tools/tests/x86_emulator/testcase.mk
>> @@ -4,6 +4,8 @@ include $(XEN_ROOT)/tools/Rules.mk
>>  
>>  $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>>  
>> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)
> 
> Is use of cc-option-add really necessary throughout here, when ...
> 
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -432,6 +432,8 @@ else
>>  CFLAGS_UBSAN :=
>>  endif
>>  
>> +CFLAGS += -fno-stack-protector
> 
> ... is isn't needed here? Iirc the compiler version ranges supported don't
> vary between components. Then again afaics $(EMBEDDED_EXTRA_CFLAGS) is used
> by x86 only right now, and with cc-options-add, so perhaps it (a) needs
> using cc-options-add here, too, and (b) it wants explaining why this needs
> generalizing from x86 to all architectures. Quite possibly hypervisor use
> of $(EMBEDDED_EXTRA_CFLAGS) may want generalizing separately, up front?

Correction: Except for PPC all architectures consume $(EMBEDDED_EXTRA_CFLAGS)
right now. So the moving is less of a generalization than I first thought. I
still need to get used to passing -R (rather than -r) to grep, to find all
instances I'm after ...

Jan



Re: [PATCH 07/15] x86/hyperlaunch: initial support for hyperlaunch device tree

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> Add the ability to detect both a formal hyperlaunch device tree or a dom0less
> device tree. If the hyperlaunch device tree is found, then count the number of
> domain entries, reporting if more than one is found.

"reporting" reads like informational logging, when comment and printk() in
walk_hyperlaunch_fdt() indicate this is actually an error (for now).

> --- a/xen/arch/x86/domain_builder/fdt.c
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -14,14 +14,76 @@
>  
>  #include "fdt.h"
>  
> +static int __init find_hyperlaunch_node(void *fdt)
> +{
> +int hv_node = fdt_path_offset(fdt, "/chosen/hypervisor");
> +if ( hv_node >= 0 )

Nit: Blank line between declaration(s) and statement(s) please (also
elsewhere).

> --- a/xen/arch/x86/domain_builder/fdt.h
> +++ b/xen/arch/x86/domain_builder/fdt.h
> @@ -11,11 +11,16 @@
>  
>  #ifdef CONFIG_DOMAIN_BUILDER
>  int has_hyperlaunch_fdt(struct boot_info *bi);
> +int walk_hyperlaunch_fdt(struct boot_info *bi);
>  #else
>  static inline int __init has_hyperlaunch_fdt(struct boot_info *bi)
>  {
>  return -EINVAL;
>  }
> +static int __init walk_hyperlaunch_fdt(struct boot_info *bi)

inline?

Jan



Re: [PATCH 14/15] x86/hyperlaunch: add max vcpu parsing of hyperlaunch device tree

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -617,6 +617,9 @@ int __init construct_dom0(struct boot_domain *bd)
>  if ( !get_memsize(&dom0_max_size, LONG_MAX) && bd->max_pages )
>  dom0_size.nr_pages = bd->max_pages;
>  
> +if ( opt_dom0_max_vcpus_max == UINT_MAX && bd->max_vcpus )
> +opt_dom0_max_vcpus_max = bd->max_vcpus;

Isn't this kind of backwards? I.e. aren't you meaning to move us towards
boot-domains?

Also, what about the counterpart opt_dom0_max_vcpus_min? That wants to be
controllable from DT too, I would think?

Jan



Re: [PATCH 15/15] x86/hyperlaunch: add capabilities to boot domain

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/domain_builder/fdt.c
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -209,6 +209,19 @@ static int __init process_domain_node(
>  bd->max_vcpus = val;
>  printk("  max vcpus: %d\n", bd->max_vcpus);
>  }
> +if ( match_fdt_property(fdt, prop, "capabilities" ) )
> +{
> +if ( fdt_prop_as_u32(prop, &bd->capabilities) != 0 )
> +{
> +printk("  failed processing domain id for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +printk("  caps: ");
> +if ( bd->capabilities & BUILD_CAPS_CONTROL )
> +printk("c");
> +printk("\n");
> +}

What if any of the other bits is set?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -992,6 +992,7 @@ static size_t __init domain_cmdline_size(
>  static struct domain *__init create_dom0(struct boot_info *bi)
>  {
>  char *cmdline = NULL;
> +int create_flags = 0;

Once again unsigned int please.

> @@ -1023,7 +1024,10 @@ static struct domain *__init create_dom0(struct 
> boot_info *bi)
>  /* Create initial domain.  Not d0 for pvshim. */
>  if ( bd->domid == DOMID_INVALID )
>  bd->domid = get_initial_domain_id();
> -d = domain_create(bd->domid, &dom0_cfg, pv_shim ? 0 : CDF_privileged);
> +if ( bd->capabilities & BUILD_CAPS_CONTROL )
> +create_flags |= CDF_privileged;

Nit: Indentation.

Jan



[PATCH v11 11/12] xen/arm: make consider_modules() available for xen relocation

2024-12-02 Thread Carlo Nonato
Cache coloring must physically relocate Xen in order to color the hypervisor
and consider_modules() is a key function that is needed to find a new
available physical address.

672d67f339c0 ("xen/arm: Split MMU-specific setup_mm() and related code out")
moved consider_modules() under arm32. Move it to mmu/setup.c and make it
non-static so that it can be used outside.

Signed-off-by: Carlo Nonato 
Reviewed-by: Michal Orzel 
---
v11:
- removed useless #include
v10:
- no changes
v9:
- no changes
v8:
- patch adapted to new changes to consider_modules()
v7:
- moved consider_modules() to arm/mmu/setup.c
v6:
- new patch
---
 xen/arch/arm/arm32/mmu/mm.c  | 95 +---
 xen/arch/arm/include/asm/setup.h |  3 +
 xen/arch/arm/mmu/setup.c | 94 +++
 3 files changed, 98 insertions(+), 94 deletions(-)

diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
index 063611412b..903d946f07 100644
--- a/xen/arch/arm/arm32/mmu/mm.c
+++ b/xen/arch/arm/arm32/mmu/mm.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -31,100 +32,6 @@ static void __init setup_directmap_mappings(unsigned long 
base_mfn,
 directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
 }
 
-/*
- * Returns the end address of the highest region in the range s..e
- * with required size and alignment that does not conflict with the
- * modules from first_mod to nr_modules.
- *
- * For non-recursive callers first_mod should normally be 0 (all
- * modules and Xen itself) or 1 (all modules but not Xen).
- */
-static paddr_t __init consider_modules(paddr_t s, paddr_t e,
-   uint32_t size, paddr_t align,
-   int first_mod)
-{
-const struct membanks *reserved_mem = bootinfo_get_reserved_mem();
-#ifdef CONFIG_STATIC_SHM
-const struct membanks *shmem = bootinfo_get_shmem();
-#endif
-const struct bootmodules *mi = &bootinfo.modules;
-int i;
-int nr;
-
-s = (s+align-1) & ~(align-1);
-e = e & ~(align-1);
-
-if ( s > e ||  e - s < size )
-return 0;
-
-/* First check the boot modules */
-for ( i = first_mod; i < mi->nr_mods; i++ )
-{
-paddr_t mod_s = mi->module[i].start;
-paddr_t mod_e = mod_s + mi->module[i].size;
-
-if ( s < mod_e && mod_s < e )
-{
-mod_e = consider_modules(mod_e, e, size, align, i+1);
-if ( mod_e )
-return mod_e;
-
-return consider_modules(s, mod_s, size, align, i+1);
-}
-}
-
-/*
- * i is the current bootmodule we are evaluating, across all
- * possible kinds of bootmodules.
- *
- * When retrieving the corresponding reserved-memory addresses, we
- * need to index the reserved_mem bank starting from 0, and only counting
- * the reserved-memory modules. Hence, we need to use i - nr.
- */
-nr = mi->nr_mods;
-for ( ; i - nr < reserved_mem->nr_banks; i++ )
-{
-paddr_t r_s = reserved_mem->bank[i - nr].start;
-paddr_t r_e = r_s + reserved_mem->bank[i - nr].size;
-
-if ( s < r_e && r_s < e )
-{
-r_e = consider_modules(r_e, e, size, align, i + 1);
-if ( r_e )
-return r_e;
-
-return consider_modules(s, r_s, size, align, i + 1);
-}
-}
-
-#ifdef CONFIG_STATIC_SHM
-nr += reserved_mem->nr_banks;
-for ( ; i - nr < shmem->nr_banks; i++ )
-{
-paddr_t r_s, r_e;
-
-r_s = shmem->bank[i - nr].start;
-
-/* Shared memory banks can contain INVALID_PADDR as start */
-if ( INVALID_PADDR == r_s )
-continue;
-
-r_e = r_s + shmem->bank[i - nr].size;
-
-if ( s < r_e && r_s < e )
-{
-r_e = consider_modules(r_e, e, size, align, i + 1);
-if ( r_e )
-return r_e;
-
-return consider_modules(s, r_s, size, align, i + 1);
-}
-}
-#endif
-
-return e;
-}
-
 /*
  * Find a contiguous region that fits in the static heap region with
  * required size and alignment, and return the end address of the region
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 64c227d171..0c560d141f 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -89,6 +89,9 @@ struct init_info
 unsigned int cpuid;
 };
 
+paddr_t consider_modules(paddr_t s, paddr_t e, uint32_t size, paddr_t align,
+ int first_mod);
+
 #endif
 /*
  * Local variables:
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index 9664e85ee6..196974f3e2 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -222,6 +222,100 @@ static void xen_pt_enforce_wnx(void)
 flush_xen_tlb_local();
 }
 
+/*
+ * Returns the end address of the highest region in the range s..e
+ * with required size and alignment that doe

[PATCH v11 09/12] xen: add cache coloring allocator for domains

2024-12-02 Thread Carlo Nonato
Add a new memory page allocator that implements the cache coloring mechanism.
The allocation algorithm enforces equal frequency distribution of cache
partitions, following the coloring configuration of a domain. This allows
for an even utilization of cache sets for every domain.

Pages are stored in a color-indexed array of lists. Those lists are filled
by a simple init function which computes the color of each page.
When a domain requests a page, the allocator extracts the page from the list
with the maximum number of free pages among those that the domain can access,
given its coloring configuration.

The allocator can only handle requests of order-0 pages. This allows for
easier implementation and since cache coloring targets only embedded systems,
it's assumed not to be a major problem.

The buddy allocator must coexist with the colored one because the Xen heap
isn't colored. For this reason a new Kconfig option and a command line
parameter are added to let the user set the amount of memory reserved for
the buddy allocator. Even when cache coloring is enabled, this memory
isn't managed by the colored allocator.

Colored heap information is dumped in the dump_heap() debug-key function.

Based on original work from: Luca Miccio 

Signed-off-by: Marco Solieri 
Signed-off-by: Carlo Nonato 
---
v11:
- CONFIG_BUDDY_ALLOCATOR_SIZE depends on CONFIG_LLC_COLORING
- buddy_alloc_size is defined only if CONFIG_LLC_COLORING
- buddy-alloc-size param is parsed only if CONFIG_LLC_COLORING
v10:
- stated explicit dependency on CONFIG_LLC_COLORING for buddy-alloc-size
- fix for MISRA rule 20.7 parenthesis
v9:
- added ASSERT(order == 0) when freeing a colored page
- moved buddy_alloc_size initialization logic in Kconfig
v8:
- requests that uses MEMF_* flags that can't be served are now going to fail
- free_color_heap_page() is called directly from free_heap_pages()
v7:
- requests to alloc_color_heap_page() now fail if MEMF_bits is used
v6:
- colored allocator functions are now static
v5:
- Carlo Nonato as the new author
- the colored allocator balances color usage for each domain and it searches
  linearly only in the number of colors (FIXME removed)
- addedd scrub functionality
- removed stub functions (still requires some macro definition)
- addr_to_color turned to mfn_to_color for easier operations
- removed BUG_ON in init_color_heap_pages() in favor of panic()
- only non empty page lists are logged in dump_color_heap()
v4:
- moved colored allocator code after buddy allocator because it now has
  some dependencies on buddy functions
- buddy_alloc_size is now used only by the colored allocator
- fixed a bug that allowed the buddy to merge pages when they were colored
- free_color_heap_page() now calls mark_page_free()
- free_color_heap_page() uses of the frametable array for faster searches
- added FIXME comment for the linear search in free_color_heap_page()
- removed alloc_color_domheap_page() to let the colored allocator exploit
  some more buddy allocator code
- alloc_color_heap_page() now allocs min address pages first
- reduced the mess in end_boot_allocator(): use the first loop for
  init_color_heap_pages()
- fixed page_list_add_prev() (list.h) since it was doing the opposite of
  what it was supposed to do
- fixed page_list_add_prev() (non list.h) to check also for next existence
- removed unused page_list_add_next()
- moved p2m code in another patch
---
 docs/misc/cache-coloring.rst  |  37 ++
 docs/misc/xen-command-line.pandoc |  14 +++
 xen/arch/arm/include/asm/mm.h |   5 +
 xen/common/Kconfig|   8 ++
 xen/common/llc-coloring.c |  13 ++
 xen/common/page_alloc.c   | 191 +-
 xen/include/xen/llc-coloring.h|   3 +
 7 files changed, 267 insertions(+), 4 deletions(-)

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
index e097e74032..5224b27afe 100644
--- a/docs/misc/cache-coloring.rst
+++ b/docs/misc/cache-coloring.rst
@@ -13,6 +13,9 @@ To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
 If needed, change the maximum number of colors with
 ``CONFIG_LLC_COLORS_ORDER=``.
 
+If needed, change the buddy allocator reserved size with
+``CONFIG_BUDDY_ALLOCATOR_SIZE=``.
+
 Runtime configuration is done via `Command line parameters`_.
 For DomUs follow `DomUs configuration`_.
 
@@ -110,6 +113,8 @@ Specific documentation is available at 
`docs/misc/xen-command-line.pandoc`.
 +--+---+
 | ``dom0-llc-colors``  | Dom0 color configuration  |
 +--+---+
+| ``buddy-alloc-size`` | Buddy allocator reserved size |
++--+---+
 
 Colors selection format
 ***
@@ -197,6 +202,17 @@ For example:
 **Note:** If no color configuration is provided for a domain, the default one,
 which corresponds to all available colors is used instead.
 
+Colored allocator and buddy all

Re: [PATCH v11 12/12] xen/arm: add cache coloring support for Xen image

2024-12-02 Thread Julien Grall

Hi Carlo,

I appreciate this is v11. So I will try to keep the comments to only 
what I consider important to fix and some coding style issue.


On 02/12/2024 16:59, Carlo Nonato wrote:

---
  xen/arch/arm/alternative.c| 30 +++-
  xen/arch/arm/arm64/mmu/head.S | 58 +++-
  xen/arch/arm/arm64/mmu/mm.c   | 28 ++--
  xen/arch/arm/include/asm/mmu/layout.h |  3 +
  xen/arch/arm/llc-coloring.c   | 63 +
  xen/arch/arm/mmu/setup.c  | 99 +++
  xen/arch/arm/setup.c  | 10 ++-
  xen/common/llc-coloring.c | 18 +
  xen/include/xen/llc-coloring.h| 14 
  9 files changed, 302 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index d99b507093..0fcf4e451d 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -191,6 +192,27 @@ static int __apply_alternatives_multi_stop(void *xenmap)
  return 0;
  }
  
+static void __init *xen_remap_colored(mfn_t xen_mfn, paddr_t xen_size)

+{
+unsigned int i;
+void *xenmap;
+mfn_t *xen_colored_mfns, mfn;
+
+xen_colored_mfns = xmalloc_array(mfn_t, xen_size >> PAGE_SHIFT);
> +if ( !xen_colored_mfns )> +panic("Can't allocate LLC 
colored MFNs\n");

+
+for_each_xen_colored_mfn ( xen_mfn, mfn, i )
+{
+xen_colored_mfns[i] = mfn;
+}


NIT: Parenthesis should not be necessary.


+
+xenmap = vmap(xen_colored_mfns, xen_size >> PAGE_SHIFT);

> +xfree(xen_colored_mfns);> +

+return xenmap;
+}
+
  /*
   * This function should only be called during boot and before CPU0 jump
   * into the idle_loop.
@@ -209,8 +231,12 @@ void __init apply_alternatives_all(void)
   * The text and inittext section are read-only. So re-map Xen to
   * be able to patch the code.
   */
-xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
-VMAP_DEFAULT);
+if ( llc_coloring_enabled )
+xenmap = xen_remap_colored(xen_mfn, xen_size);
+else
+xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
+VMAP_DEFAULT);
+
  /* Re-mapping Xen is not expected to fail during boot. */
  BUG_ON(!xenmap);
  
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S

index 665a51a337..a1fc9a82f1 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -428,6 +428,61 @@ FUNC_LOCAL(fail)
  b 1b
  END(fail)
  
+/*

+ * Copy Xen to new location and switch TTBR
+ * x0ttbr
+ * x1source address
+ * x2destination address
+ * x3length
+ *
+ * Source and destination must be word aligned, length is rounded up
+ * to a 16 byte boundary.
+ *
+ * MUST BE VERY CAREFUL when saving things to RAM over the copy
+ */
+ENTRY(relocate_xen)


We are trying to get rid of ENTRY. Instead, please use FUNC().


+/*
+ * Copy 16 bytes at a time using:
+ *   x9: counter
+ *   x10: data
+ *   x11: data
+ *   x12: source
+ *   x13: destination
+ */
+mov x9, x3
+mov x12, x1
+mov x13, x2
+
+1:  ldp x10, x11, [x12], #16
+stp x10, x11, [x13], #16
+
+subsx9, x9, #16
+bgt 1b
+
+/*
+ * Flush destination from dcache using:
+ *   x9: counter
+ *   x10: step
+ *   x11: vaddr
+ *
+ * This is to ensure data is visible to the instruction cache

> + */

The comments implies that the only reason we need to flush the cache is 
to ensure the data and cache is coherent. But...



+dsb   sy
+
+mov   x9, x3
+ldr   x10, =dcache_line_bytes /* x10 := step */

> +ldr   x10, [x10]> +mov   x11, x2

+
+1:  dccvac, x11


... here you use Point of Coherency. Point of Unification should be 
sufficient here. This is boot code, so I am not against having stricter 
cache maintenance. But it would be good to clarify.



+
+add   x11, x11, x10
+subs  x9, x9, x10
+bgt   1b
+
+/* No need for dsb/isb because they are alredy done in switch_ttbr_id 
*/
+b switch_ttbr_id
+
  /*
   * Switch TTBR
   *
@@ -453,7 +508,8 @@ FUNC(switch_ttbr_id)
  
  /*

   * 5) Flush I-cache
- * This should not be necessary but it is kept for safety.
+ * This should not be necessary in the general case, but it's needed
+ * for cache coloring because code is relocated in that case.
   */
  ic iallu
  isb
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index 671eaadbc1..3732d5897e 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -1,6 +1,7 @@
  /* SPDX-License-Identifier: GPL-

[PATCH v11 07/12] xen/arm: add support for cache coloring configuration via device-tree

2024-12-02 Thread Carlo Nonato
Add the "llc-colors" Device Tree property to express DomUs and Dom0less
color configurations.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Reviewed-by: Jan Beulich  # non-Arm
Reviewed-by: Michal Orzel 
---
v11:
- made clear that llc-colors device-tree property is Arm64-only in booting.txt
v10:
- no changes
v9:
- use best-effort allocation in domain_set_llc_colors_from_str()
v8:
- fixed memory leak on error path of domain_set_llc_colors_from_str()
- realloc colors array after parsing from string to reduce memory usage
v7:
- removed alloc_colors() helper usage from domain_set_llc_colors_from_str()
v6:
- rewrote domain_set_llc_colors_from_str() to be more explicit
v5:
- static-mem check has been moved in a previous patch
- added domain_set_llc_colors_from_str() to set colors after domain creation
---
 docs/misc/arm/device-tree/booting.txt |  5 +++
 docs/misc/cache-coloring.rst  | 48 +++
 xen/arch/arm/dom0less-build.c | 10 ++
 xen/common/llc-coloring.c | 40 ++
 xen/include/xen/llc-coloring.h|  1 +
 xen/include/xen/xmalloc.h | 12 +++
 6 files changed, 116 insertions(+)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 3a04f5c57f..9c881baccc 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -162,6 +162,11 @@ with the following properties:
 
 An integer specifying the number of vcpus to allocate to the guest.
 
+- llc-colors
+A string specifying the LLC color configuration for the guest.
+Refer to docs/misc/cache_coloring.rst for syntax. This option is applicable
+only to Arm64 guests.
+
 - vpl011
 
 An empty property to enable/disable a virtual pl011 for the guest to
diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
index 7b47d0ed92..e097e74032 100644
--- a/docs/misc/cache-coloring.rst
+++ b/docs/misc/cache-coloring.rst
@@ -14,6 +14,7 @@ If needed, change the maximum number of colors with
 ``CONFIG_LLC_COLORS_ORDER=``.
 
 Runtime configuration is done via `Command line parameters`_.
+For DomUs follow `DomUs configuration`_.
 
 Background
 **
@@ -149,6 +150,53 @@ LLC specs can be manually set via the above command line 
parameters. This
 bypasses any auto-probing and it's used to overcome failing situations, such as
 flawed probing logic, or for debugging/testing purposes.
 
+DomUs configuration
+***
+
+DomUs colors can be set either in the ``xl`` configuration file (documentation
+at `docs/man/xl.cfg.pod.5.in`) or via Device Tree (documentation at
+`docs/misc/arm/device-tree/booting.txt`) using the ``llc-colors`` option.
+For example:
+
+::
+
+xen,xen-bootargs = "console=dtuart dtuart=serial0 dom0_mem=1G 
dom0_max_vcpus=1 sched=null llc-coloring=on dom0-llc-colors=2-6";
+xen,dom0-bootargs "console=hvc0 earlycon=xen earlyprintk=xen 
root=/dev/ram0"
+
+dom0 {
+compatible = "xen,linux-zimage" "xen,multiboot-module";
+reg = <0x0 0x100 0x0 15858176>;
+};
+
+dom0-ramdisk {
+compatible = "xen,linux-initrd" "xen,multiboot-module";
+reg = <0x0 0x200 0x0 20638062>;
+};
+
+domU0 {
+#address-cells = <0x1>;
+#size-cells = <0x1>;
+compatible = "xen,domain";
+memory = <0x0 0x4>;
+llc-colors = "4-8,10,11,12";
+cpus = <0x1>;
+vpl011 = <0x1>;
+
+module@200 {
+compatible = "multiboot,kernel", "multiboot,module";
+reg = <0x200 0xff>;
+bootargs = "console=ttyAMA0";
+};
+
+module@3000 {
+compatible = "multiboot,ramdisk", "multiboot,module";
+reg = <0x300 0xff>;
+};
+};
+
+**Note:** If no color configuration is provided for a domain, the default one,
+which corresponds to all available colors is used instead.
+
 Known issues and limitations
 
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 67b1503647..49d1f14d65 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -817,6 +817,7 @@ void __init create_domUs(void)
 bool iommu = false;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
+const char *llc_colors_str = NULL;
 
 BUG_ON(chosen == NULL);
 dt_for_each_child_node(chosen, node)
@@ -965,6 +966,10 @@ void __init create_domUs(void)
 #endif
 }
 
+dt_property_read_string(node, "llc-colors", &llc_colors_str);
+if ( !llc_coloring_enabled && llc_colors_str )
+panic("'llc-colors' found, but LLC coloring is disabled\n");
+
 /*
  * The variable max_init_domid is initialized with zero, so here it's
  * very important to use the pre-increment operat

Re: [PATCH v3 2/5] arm/setup: Move MMU specific extern declarations to mmu/setup.h

2024-12-02 Thread Julien Grall

Hi Luca,

On 29/11/2024 09:12, Luca Fancellu wrote:

Move some extern declarations related to MMU structures and define
from asm/setup.h to asm/mmu/setup.h, in order to increase encapsulation
and allow the MPU part to build, since it has no clue about them.

Signed-off-by: Luca Fancellu 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall




[PATCH v11 01/12] xen/common: add cache coloring common code

2024-12-02 Thread Carlo Nonato
Last Level Cache (LLC) coloring allows to partition the cache in smaller
chunks called cache colors.

Since not all architectures can actually implement it, add a HAS_LLC_COLORING
Kconfig option.
LLC_COLORS_ORDER Kconfig option has a range maximum of 10 (2^10 = 1024)
because that's the number of colors that fit in a 4 KiB page when integers
are 4 bytes long.

LLC colors are a property of the domain, so struct domain has to be extended.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Acked-by: Michal Orzel 
---
v11:
- __COLORING_H__ -> __XEN_LLC_COLORING_H__ in llc-coloring.h
- added SPDX tag to cache-coloring.rst
- llc-coloring=off now takes precedence over other cmdline options
- removed useless #includes
v10:
- fixed commit message to use LLC_COLORS_ORDER
- added documentation to index.rst
- moved check on CONFIG_NUMA in arch/arm/Kconfig (next patch)
- fixed copyright line
- fixed array type for colors parameter in print_colors()
- added check on (way_size & ~PAGE_MASK)
v9:
- dropped _MAX_ from CONFIG_MAX_LLC_COLORS_ORDER
v8:
- minor documentation fixes
- "llc-coloring=on" is inferred from "llc-nr-ways" and "llc-size" usage
- turned CONFIG_NR_LLC_COLORS to CONFIG_MAX_LLC_COLORS_ORDER, base-2 exponent
- moved Kconfig options to common/Kconfig
- don't crash if computed max_nr_colors is too large
v7:
- SUPPORT.md changes added to this patch
- extended documentation to better address applicability of cache coloring
- "llc-nr-ways" and "llc-size" params introduced in favor of "llc-way-size"
- moved dump_llc_coloring_info() call in 'm' keyhandler (pagealloc_info())
v6:
- moved almost all code in common
- moved documentation in this patch
- reintroduced range for CONFIG_NR_LLC_COLORS
- reintroduced some stub functions to reduce the number of checks on
  llc_coloring_enabled
- moved domain_llc_coloring_free() in same patch where allocation happens
- turned "d->llc_colors" to pointer-to-const
- llc_coloring_init() now returns void and panics if errors are found
v5:
- used - instead of _ for filenames
- removed domain_create_llc_colored()
- removed stub functions
- coloring domain fields are now #ifdef protected
v4:
- Kconfig options moved to xen/arch
- removed range for CONFIG_NR_LLC_COLORS
- added "llc_coloring_enabled" global to later implement the boot-time
  switch
- added domain_create_llc_colored() to be able to pass colors
- added is_domain_llc_colored() macro
---
 SUPPORT.md|   7 ++
 docs/index.rst|   1 +
 docs/misc/cache-coloring.rst  | 118 +
 docs/misc/xen-command-line.pandoc |  37 +
 xen/common/Kconfig|  21 ++
 xen/common/Makefile   |   1 +
 xen/common/keyhandler.c   |   3 +
 xen/common/llc-coloring.c | 121 ++
 xen/common/page_alloc.c   |   3 +
 xen/include/xen/llc-coloring.h|  36 +
 xen/include/xen/sched.h   |   5 ++
 11 files changed, 353 insertions(+)
 create mode 100644 docs/misc/cache-coloring.rst
 create mode 100644 xen/common/llc-coloring.c
 create mode 100644 xen/include/xen/llc-coloring.h

diff --git a/SUPPORT.md b/SUPPORT.md
index 82239d0294..998faf5635 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -401,6 +401,13 @@ by maintaining multiple physical to machine (p2m) memory 
mappings.
 Status, x86 HVM: Tech Preview
 Status, ARM: Tech Preview
 
+### Cache coloring
+
+Allows to reserve Last Level Cache (LLC) partitions for Dom0, DomUs and Xen
+itself.
+
+Status, Arm64: Experimental
+
 ## Resource Management
 
 ### CPU Pools
diff --git a/docs/index.rst b/docs/index.rst
index 1d44796d72..1bb8d02ea3 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -66,6 +66,7 @@ Documents in need of some rearranging.
misc/xen-makefiles/makefiles
misra/index
fusa/index
+   misc/cache-coloring
 
 
 Miscellanea
diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
new file mode 100644
index 00..371f21a0e7
--- /dev/null
+++ b/docs/misc/cache-coloring.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Xen cache coloring user guide
+=
+
+The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
+partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
+Cache coloring realizes per-set cache partitioning in software and is 
applicable
+to shared LLCs as implemented in Cortex-A53, Cortex-A72 and similar CPUs.
+
+To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
+
+If needed, change the maximum number of colors with
+``CONFIG_LLC_COLORS_ORDER=``.
+
+Runtime configuration is done via `Command line parameters`_.
+
+Background
+**
+
+Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
+to each core (hence using multiple cache units), while the last level is shared
+among all of them. Such configuration implies tha

[PATCH v11 06/12] tools: add support for cache coloring configuration

2024-12-02 Thread Carlo Nonato
Add a new "llc_colors" parameter that defines the LLC color assignment for
a domain. The user can specify one or more color ranges using the same
syntax used everywhere else for color config described in the
documentation.
The parameter is defined as a list of strings that represent the color
ranges.

Documentation is also added.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v11:
- turned unsigned int to uint32_t in xc_domain_set_llc_colors()
- return -1 in case of error instead of ENOMEM in xc_domain_set_llc_colors()
- added regenerated go bindings
v10:
- no changes
v9:
- turned warning into error in case of coloring not enabled
v8:
- warn the user in case of coloring not supported at hypervisor level
v7:
- removed unneeded NULL check before xc_hypercall_buffer_free() in
  xc_domain_set_llc_colors()
v6:
- no edits
v5:
- added LIBXL_HAVE_BUILDINFO_LLC_COLORS
- moved color configuration in xc_domain_set_llc_colors() cause of the new
  hypercall
v4:
- removed overlapping color ranges checks during parsing
- moved hypercall buffer initialization in libxenctrl
---
 docs/man/xl.cfg.5.pod.in |  6 +
 tools/golang/xenlight/helpers.gen.go | 16 
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/include/libxl.h|  5 
 tools/include/xenctrl.h  |  9 +++
 tools/libs/ctrl/xc_domain.c  | 34 +
 tools/libs/light/libxl_create.c  | 18 +
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  | 38 +++-
 9 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index ac3f88fd57..8e1422104e 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3074,6 +3074,12 @@ raised.
 
 =over 4
 
+=item B
+
+Specify the Last Level Cache (LLC) color configuration for the guest.
+B can be either a single color value or a hypen-separated closed
+interval of colors (such as "0-4").
+
 =item B
 
 An optional integer parameter specifying the number of SPIs (Shared
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index fe5110474d..90846ea8e8 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1097,6 +1097,14 @@ if err := x.Iomem[i].fromC(&v); err != nil {
 return fmt.Errorf("converting field Iomem: %v", err) }
 }
 }
+x.LlcColors = nil
+if n := int(xc.num_llc_colors); n > 0 {
+cLlcColors := (*[1<<28]C.uint32_t)(unsafe.Pointer(xc.llc_colors))[:n:n]
+x.LlcColors = make([]uint32, n)
+for i, v := range cLlcColors {
+x.LlcColors[i] = uint32(v)
+}
+}
 if err := x.ClaimMode.fromC(&xc.claim_mode);err != nil {
 return fmt.Errorf("converting field ClaimMode: %v", err)
 }
@@ -1453,6 +1461,14 @@ return fmt.Errorf("converting field Iomem: %v", err)
 }
 }
 }
+if numLlcColors := len(x.LlcColors); numLlcColors > 0 {
+xc.llc_colors = (*C.uint32_t)(C.malloc(C.size_t(numLlcColors*numLlcColors)))
+xc.num_llc_colors = C.int(numLlcColors)
+cLlcColors := 
(*[1<<28]C.uint32_t)(unsafe.Pointer(xc.llc_colors))[:numLlcColors:numLlcColors]
+for i,v := range x.LlcColors {
+cLlcColors[i] = C.uint32_t(v)
+}
+}
 if err := x.ClaimMode.toC(&xc.claim_mode); err != nil {
 return fmt.Errorf("converting field ClaimMode: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index c9e45b306f..e7667f1ce3 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -575,6 +575,7 @@ SchedParams DomainSchedParams
 Ioports []IoportRange
 Irqs []uint32
 Iomem []IomemRange
+LlcColors []uint32
 ClaimMode Defbool
 EventChannels uint32
 Kernel string
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 8d32428ea9..f8fe4afd7d 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1379,6 +1379,11 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, 
const libxl_mac *src);
  */
 #define LIBXL_HAVE_BUILDINFO_HVM_SYSTEM_FIRMWARE
 
+/*
+ * The libxl_domain_build_info has the llc_colors array.
+ */
+#define LIBXL_HAVE_BUILDINFO_LLC_COLORS 1
+
 /*
  * ERROR_REMUS_XXX error code only exists from Xen 4.5, Xen 4.6 and it
  * is changed to ERROR_CHECKPOINT_XXX in Xen 4.7
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 29617585c5..d6d93a2e8f 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2667,6 +2667,15 @@ int xc_livepatch_replace(xc_interface *xch, char *name, 
uint32_t timeout, uint32
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
  xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+/*
+ * Set LLC colors for a domain.
+ * It can only be used directly after domain creation. An attempt to use it
+ * afterwards will result in an error.
+ */
+int xc_domain_set_llc_colors(xc_interface *xch, uint32_t domid,
+ const uint32_t *llc_color

Re: [PATCH v2 1/2] xen/swiotlb: add alignment check for dma buffers

2024-12-02 Thread Thierry Escande


On 02/12/2024 09:27, Jürgen Groß wrote:
> On 29.11.24 18:36, Thierry Escande wrote:
>> Hi,
>>
>> On 16/09/2024 08:47, Juergen Gross wrote:
>>> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
>>> index 35155258a7e2..ddf5b1df632e 100644
>>> --- a/drivers/xen/swiotlb-xen.c
>>> +++ b/drivers/xen/swiotlb-xen.c
>>> @@ -78,9 +78,15 @@ static inline int
>>> range_straddles_page_boundary(phys_addr_t p, size_t size)
>>>   {
>>>   unsigned long next_bfn, xen_pfn = XEN_PFN_DOWN(p);
>>>   unsigned int i, nr_pages = XEN_PFN_UP(xen_offset_in_page(p) +
>>> size);
>>> +    phys_addr_t algn = 1ULL << (get_order(size) + PAGE_SHIFT);
>>>     next_bfn = pfn_to_bfn(xen_pfn);
>>>   +    /* If buffer is physically aligned, ensure DMA alignment. */
>>> +    if (IS_ALIGNED(p, algn) &&
>>> +    !IS_ALIGNED(next_bfn << XEN_PAGE_SHIFT, algn))
>>> +    return 1;
>>
>> There is a regression in the mpt3sas driver because of this change.
>> When, in a dom0, this driver creates its DMA pool at probe time and
>> calls dma_pool_zalloc() (see [1]), the call to
>> range_straddles_page_boundary() (from xen_swiotlb_alloc_coherent())
>> returns 1 because of the alignment checks added by this patch. Then the
>> call to xen_create_contiguous_region() in xen_swiotlb_alloc_coherent()
>> fails because the passed size order is too big (> MAX_CONTIG_ORDER).
>> This driver sets the pool allocation block size to 2.3+ MBytes.
>>
>>  From previous discussions on the v1 patch, these checks are not
>> necessary from xen_swiotlb_alloc_coherent() that already ensures
>> alignment, right?
>
> It ensures alignment regarding guest physical memory, but it doesn't do
> so for machine memory.
>
> For DMA machine memory proper alignment might be needed, too, so the
> check is required. As an example, some crypto drivers seem to rely on
> proper machine memory alignment, which was the reason for those checks
> to be added.
>
> Possible solutions include:
>
> - rising the MAX_CONTIG_ORDER limit (to which value?)

Looks like the quick and easy solution. Bumping MAX_CONTIG_ORDER to 10
would allow 4MB pools, enough for this particular driver. I'll send a
patch if that sounds reasonable.

Regards,
Thierry

> - adding a way to allocate large DMA buffers with relaxed alignment
>   requirements (this will impact the whole DMA infrastructure plus
>   drivers like mp3sas which would need to use the new interface)
> - modify the mpt3sas driver to stay within the current limits
> - only guarantee proper machine memory alignment up to MAX_CONTIG_ORDER
>   (risking to introduce hard to diagnose bugs for the rare use cases of
>   such large buffers)
>
>
> Juergen




[PATCH v11 08/12] xen/page_alloc: introduce preserved page flags macro

2024-12-02 Thread Carlo Nonato
PGC_static and PGC_extra need to be preserved when assigning a page.
Define a new macro that groups those flags and use it instead of or'ing
every time.

Signed-off-by: Carlo Nonato 
---
v11:
- removed PGC_broken from PGC_preserved
- removed PGC preservation from mark_page_free()
v10:
- fixed commit message
v9:
- add PGC_broken to PGC_preserved
- clear PGC_extra in alloc_domheap_pages() only if MEMF_no_refcount is set
v8:
- fixed PGC_extra ASSERT fail in alloc_domheap_pages() by removing PGC_extra
  before freeing
v7:
- PGC_preserved used also in mark_page_free()
v6:
- preserved_flags renamed to PGC_preserved
- PGC_preserved is used only in assign_pages()
v5:
- new patch
---
 xen/common/page_alloc.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 55d561e93c..e73b404169 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -161,6 +161,7 @@
 #endif
 
 #define PGC_no_buddy_merge PGC_static
+#define PGC_preserved (PGC_extra | PGC_static)
 
 #ifndef PGT_TYPE_INFO_INITIALIZER
 #define PGT_TYPE_INFO_INITIALIZER 0
@@ -1447,8 +1448,7 @@ static bool mark_page_free(struct page_info *pg, mfn_t 
mfn)
 break;
 
 case PGC_state_offlining:
-pg->count_info = (pg->count_info & PGC_broken) |
- PGC_state_offlined;
+pg->count_info = (pg->count_info & PGC_broken) | PGC_state_offlined;
 pg_offlined = true;
 break;
 
@@ -2382,7 +2382,7 @@ int assign_pages(
 
 for ( i = 0; i < nr; i++ )
 {
-ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static)));
+ASSERT(!(pg[i].count_info & ~PGC_preserved));
 if ( pg[i].count_info & PGC_extra )
 extra_pages++;
 }
@@ -2442,7 +2442,7 @@ int assign_pages(
 page_set_owner(&pg[i], d);
 smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
 pg[i].count_info =
-(pg[i].count_info & (PGC_extra | PGC_static)) | PGC_allocated | 1;
+(pg[i].count_info & PGC_preserved) | PGC_allocated | 1;
 
 page_list_add_tail(&pg[i], page_to_list(d, &pg[i]));
 }
@@ -2501,6 +2501,14 @@ struct page_info *alloc_domheap_pages(
 }
 if ( assign_page(pg, order, d, memflags) )
 {
+if ( memflags & MEMF_no_refcount )
+{
+unsigned long i;
+
+for ( i = 0; i < (1UL << order); i++ )
+pg[i].count_info &= ~PGC_extra;
+}
+
 free_heap_pages(pg, order, memflags & MEMF_no_scrub);
 return NULL;
 }
@@ -2555,6 +2563,7 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 {
 ASSERT(d->extra_pages);
 d->extra_pages--;
+pg[i].count_info &= ~PGC_extra;
 }
 }
 
-- 
2.43.0




Re: [PATCH v2 1/2] xen/mpu: Map early uart when earlyprintk on

2024-12-02 Thread Julien Grall

Hi Ayan,

On 27/11/2024 18:39, Ayan Kumar Halder wrote:

CONFIG_EARLY_UART_SIZE is introduced to let user provide physical size of
early UART. Unlike MMU where we map a page in the virtual address space,
here we need to know the exact physical size to be mapped.
As VA == PA in case of MPU, the memory layout follows exactly the hardware
configuration. As a consequence, we set  EARLY_UART_VIRTUAL_ADDRESS as physical
address.

EARLY_UART_BASE_ADDRESS and EARLY_UART_SIZE should be aligned to the minimum
size of MPU region (ie 64 bits) as per the hardware restrictions. Refer ARM
DDI 0600A.d ID120821 A1.3 "A minimum protection region size of 64 bytes.".

UART is mapped as nGnRE region (as specified by ATTR=100 , refer G1.3.13,
MAIR_EL2, "---0100 Device memory nGnRE") and Doc ID - 102670_0101_02_en


I can't find the Doc you point online. Do you have a link?


Table 4-3, Armv8 architecture memory types (nGnRE - Corresponds to Device in
Armv7 architecture). Also, it is mapped as outer shareable, RW at EL2 only


I don't quite understand why you mention Armv7 here. The code you modify 
below is 64-bit so Armv8 only.


The code itself LGTM.

Cheers,

--
Julien Grall




Re: [PATCH v2 2/2] xen/mmu: enable SMMU subsystem only in MMU

2024-12-02 Thread Julien Grall

Hi Ayan,

On 27/11/2024 18:39, Ayan Kumar Halder wrote:

From: Penny Zheng 

In Xen, SMMU subsystem is supported for MMU system only. The reason being SMMU
driver uses the same page tables as MMU.
Thus, we make it dependent on CONFIG_MMU.

Signed-off-by: Penny Zheng 
Signed-off-by: Ayan Kumar Halder 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall




[PATCH v11 03/12] xen/arm: permit non direct-mapped Dom0 construction

2024-12-02 Thread Carlo Nonato
Cache coloring requires Dom0 not to be direct-mapped because of its non
contiguous mapping nature, so allocate_memory() is needed in this case.
8d2c3ab18cc1 ("arm/dom0less: put dom0less feature code in a separate module")
moved allocate_memory() in dom0less_build.c. In order to use it
in Dom0 construction bring it back to domain_build.c and declare it in
domain_build.h.

Take the opportunity to adapt the implementation of allocate_memory() so
that it uses the host layout when called on the hwdom, via
find_unallocated_memory().

Signed-off-by: Carlo Nonato 
---
v11:
- GUEST_RAM_BANKS instead of hardcoding the number of banks in allocate_memory()
- hwdom_ext_regions -> hwdom_free_mem in allocate_memory()
- added a comment in allocate_memory() when skipping small banks
v10:
- fixed a compilation bug that happened when dom0less support was disabled
v9:
- no changes
v8:
- patch adapted to new changes to allocate_memory()
v7:
- allocate_memory() now uses the host layout when called on the hwdom
v6:
- new patch
---
 xen/arch/arm/dom0less-build.c   | 44 ---
 xen/arch/arm/domain_build.c | 97 -
 xen/arch/arm/include/asm/domain_build.h |  1 +
 3 files changed, 94 insertions(+), 48 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index d93a85434e..67b1503647 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -49,50 +49,6 @@ bool __init is_dom0less_mode(void)
 return ( !dom0found && domUfound );
 }
 
-static void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
-{
-struct membanks *mem = kernel_info_get_mem(kinfo);
-unsigned int i;
-paddr_t bank_size;
-
-printk(XENLOG_INFO "Allocating mappings totalling %ldMB for %pd:\n",
-   /* Don't want format this as PRIpaddr (16 digit hex) */
-   (unsigned long)(kinfo->unassigned_mem >> 20), d);
-
-mem->nr_banks = 0;
-bank_size = MIN(GUEST_RAM0_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
-   bank_size) )
-goto fail;
-
-bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
-   bank_size) )
-goto fail;
-
-if ( kinfo->unassigned_mem )
-goto fail;
-
-for( i = 0; i < mem->nr_banks; i++ )
-{
-printk(XENLOG_INFO "%pd BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
-   d,
-   i,
-   mem->bank[i].start,
-   mem->bank[i].start + mem->bank[i].size,
-   /* Don't want format this as PRIpaddr (16 digit hex) */
-   (unsigned long)(mem->bank[i].size >> 20));
-}
-
-return;
-
-fail:
-panic("Failed to allocate requested domain memory."
-  /* Don't want format this as PRIpaddr (16 digit hex) */
-  " %ldKB unallocated. Fix the VMs configurations.\n",
-  (unsigned long)kinfo->unassigned_mem >> 10);
-}
-
 #ifdef CONFIG_VGICV2
 static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
 {
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2c30792de8..2b8cba9b2f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -416,7 +416,6 @@ static void __init allocate_memory_11(struct domain *d,
 }
 }
 
-#ifdef CONFIG_DOM0LESS_BOOT
 bool __init allocate_domheap_memory(struct domain *d, paddr_t tot_size,
 alloc_domheap_mem_cb cb, void *extra)
 {
@@ -508,7 +507,6 @@ bool __init allocate_bank_memory(struct kernel_info *kinfo, 
gfn_t sgfn,
 
 return true;
 }
-#endif
 
 /*
  * When PCI passthrough is available we want to keep the
@@ -1003,6 +1001,94 @@ out:
 return res;
 }
 
+void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
+{
+struct membanks *mem = kernel_info_get_mem(kinfo);
+unsigned int i, nr_banks = GUEST_RAM_BANKS;
+paddr_t bank_start, bank_size;
+struct membanks *hwdom_free_mem = NULL;
+const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
+const uint64_t banksize[] = GUEST_RAM_BANK_SIZES;
+
+printk(XENLOG_INFO "Allocating mappings totalling %ldMB for %pd:\n",
+   /* Don't want format this as PRIpaddr (16 digit hex) */
+   (unsigned long)(kinfo->unassigned_mem >> 20), d);
+
+mem->nr_banks = 0;
+/*
+ * Use host memory layout for hwdom. Only case for this is when LLC 
coloring
+ * is enabled.
+ */
+if ( is_hardware_domain(d) )
+{
+ASSERT(llc_coloring_enabled);
+
+hwdom_free_mem = xzalloc_flex_struct(struct membanks, bank,
+ NR_MEM_BANKS);
+if ( !hwdom_free_mem )
+goto fail;
+
+hwdom_free_mem->max_banks = NR_MEM_BANKS;
+
+if ( find_unallocated_memory(kinfo, hwdom_free_mem) )
+goto fail;
+
+n

Re: [PATCH 3/5] build: arm64: provide -target and -march if using clang

2024-12-02 Thread Julien Grall

Hi Jan,

On 02/12/2024 07:52, Jan Beulich wrote:

On 30.11.2024 18:15, Julien Grall wrote:

On 29/11/2024 22:12, Volodymyr Babchuk wrote:

Jan Beulich  writes:

On 29.11.2024 02:49, Volodymyr Babchuk wrote:

--- a/config/arm64.mk
+++ b/config/arm64.mk
@@ -5,6 +5,10 @@ CONFIG_XEN_INSTALL_SUFFIX :=
   
   CFLAGS += #-marm -march= -mcpu= etc
   
+ifeq ($(clang),y)

+CFLAGS += -target aarch64 -march=armv8-a
+endif


Why is this dependent on (just?) $(clang), not (also?) $(llvm)?


Because this parameter is handled by clang only. There is no harm in
providing it explicitly. When building on arm64, value of this parameter
will match the default value for the platform. When building on x86, we
need to tell clang that it should generate arm64 code anyways. There is
no reason in trying to make ARM build with x86 instruction set.


Also
this affects both toolstack builds and hypervisor. Is applying -march
like this actually appropriate for the toolstack?


This is a good question. I can't see why this can't be appropriate for
toolstack. I.e. what bad can happen when building the toolstack.


In the future, we may want to build the tools for Armv8-M. So I think
the -march should also applies for the toolstack.


Perhaps I don't know enough of the Arm world, but: Wouldn't it be possible
to build a tool stack suitable for a wide range for Arm64 flavors, while
requiring more targeted hypervisor binaries?


Good question. There are some commonnality between ARMv8-M and ARMv8-R 
but I am not sure whether it means we could use -march=armv8-a build and 
run on ARMv8-M. Adding Ayan and Luca.


Cheers,

--
Julien Grall




Re: [PATCH v3 5/5] xen/arm: Move setup_frametable_mappings to arm/mmu

2024-12-02 Thread Julien Grall

Hi Luca,

On 29/11/2024 09:12, Luca Fancellu wrote:

diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
index 2cb44b857dd2..1c89602947de 100644
--- a/xen/arch/arm/mmu/Makefile
+++ b/xen/arch/arm/mmu/Makefile
@@ -1,3 +1,4 @@
+obj-y += mm.o
  obj-y += p2m.o
  obj-y += pt.o
  obj-y += setup.o
diff --git a/xen/arch/arm/mmu/mm.c b/xen/arch/arm/mmu/mm.c
new file mode 100644
index ..aefec908b7f2
--- /dev/null
+++ b/xen/arch/arm/mmu/mm.c
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-only */


arch/arm/mm.c is GPL-2.0-or-later. AFAIU, there are still disagreements 
within the committers on whether we can drop the "or later" for existing 
code. So I would suggest to keep the same license.


The rest LGTM.

Cheers,

--
Julien Grall




Re: [PATCH v6 1/3] xen/arm: mpu: Create boot-time MPU protection regions

2024-12-02 Thread Julien Grall




On 02/12/2024 15:36, Ayan Kumar Halder wrote:


On 30/11/2024 17:45, Julien Grall wrote:

Hi Ayan,

Hi Julien,


On 18/11/2024 12:12, Ayan Kumar Halder wrote:

+/*
+ * Macro to prepare and set a EL2 MPU memory region.
+ * We will also create an according MPU memory region entry, which
+ * is a structure of pr_t,  in table \prmap.
+ *
+ * sel: region selector
+ * base:    reg storing base address
+ * limit:   reg storing limit address
+ * prbar:   store computed PRBAR_EL2 value
+ * prlar:   store computed PRLAR_EL2 value
+ * maxcount:    maximum number of EL2 regions supported
+ * attr_prbar:  PRBAR_EL2-related memory attributes. If not 
specified it will be

+ *  REGION_DATA_PRBAR
+ * attr_prlar:  PRLAR_EL2-related memory attributes. If not 
specified it will be

+ *  REGION_NORMAL_PRLAR
+ *
+ * Preserves \maxcount
+ * Clobbers \sel, \base, \limit, \prbar, \prlar


Per this line, "\sel" is clobbered. Which implies the caller cannot 
rely on the value. Yet ...



+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro prepare_xen_region, sel, base, limit, prbar, prlar, maxcount, 
attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR

+    /* Check if the region is empty */
+    cmp   \base, \limit
+    beq   1f
+
+    /* Check if the number of regions exceeded the count specified 
in MPUIR_EL2 */

+    cmp   \sel, \maxcount
+    bge   fail_insufficient_regions
+
+    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
+    and   \base, \base, #MPU_REGION_MASK
+    mov   \prbar, #\attr_prbar
+    orr   \prbar, \prbar, \base
+
+    /* Limit address should be inclusive */
+    sub   \limit, \limit, #1
+    and   \limit, \limit, #MPU_REGION_MASK
+    mov   \prlar, #\attr_prlar
+    orr   \prlar, \prlar, \limit
+
+    msr   PRSELR_EL2, \sel
+    isb
+    msr   PRBAR_EL2, \prbar
+    msr   PRLAR_EL2, \prlar
+    dsb   sy
+    isb
+
+    add   \sel, \sel, #1
+
+1:
+.endm
+
+/*
+ * Failure caused due to insufficient MPU regions.
+ */
+FUNC_LOCAL(fail_insufficient_regions)
+    PRINT("- Selected MPU region is above the implemented number in 
MPUIR_EL2 -\r\n")

+1:  wfe
+    b   1b
+END(fail_insufficient_regions)
+
+/*
+ * Maps the various sections of Xen (described in xen.lds.S) as 
different MPU

+ * regions.
+ *
+ * Clobbers x0 - x5
+ *
+ */
+FUNC(enable_boot_cpu_mm)
+    /* Get the number of regions specified in MPUIR_EL2 */
+    mrs   x5, MPUIR_EL2
+    and   x5, x5, #NUM_MPU_REGIONS_MASK
+
+    /* x0: region sel */
+    mov   x0, xzr
+    /* Xen text section. */
+    ldr   x1, =_stext
+    ldr   x2, =_etext
+    prepare_xen_region x0, x1, x2, x3, x4, x5, 
attr_prbar=REGION_TEXT_PRBAR

+
+    /* Xen read-only data section. */
+    ldr   x1, =_srodata
+    ldr   x2, =_erodata
+    prepare_xen_region x0, x1, x2, x3, x4, x5, 
attr_prbar=REGION_RO_PRBAR



... you will pass x0 (\sel) without any update from the previous call. 
So effectively, you treat \sel as an input/output that will get 
incremented.


Therefore the comment needs to be updated. Possibly to:

"
output:

sel: Next available index in the MPU
"

Yes,  makes sense.


I will commit the series once we agree on the wording.


Yes, agreed with the wording.


Perfect. I have committed the series. I am looking forward for the rest 
of the MPU support :).


Cheers,

--
Julien Grall




Re: [PATCH 06/15] x86/hyperlaunch: introduce the domain builder

2024-12-02 Thread Jan Beulich
On 25.11.2024 18:52, Jason Andryuk wrote:
> On 2024-11-23 13:20, Daniel P. Smith wrote:
>> --- /dev/null
>> +++ b/xen/arch/x86/domain_builder/fdt.c
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024, Apertus Solutions, LLC
>> + */
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include  /* required for asm/setup.h */
> 
> Should asm/setup.h just be changed?

Why would it need changing (and why's that #include needed)? It has a
proper forward decl of the struct tag, and I can't see why it would need
anything else.

Jan



Re: [PATCH v2 4/4] xen: riscv: enable stack protector feature

2024-12-02 Thread oleksii . kurochko
On Mon, 2024-12-02 at 09:12 +0100, Jan Beulich wrote:
> On 30.11.2024 02:10, Volodymyr Babchuk wrote:
> > Enable previously added CONFIG_STACK_PROTECTOR feature for RISC-V
> > platform. Here we can call boot_stack_chk_guard_setup() in
> > start_xen()
> > function, because it never returns, so stack protector code will
> > not
> > be triggered because of changed canary.
> > 
> > Signed-off-by: Volodymyr Babchuk 
> > Tested-by: Oleksii Kurochko 
> 
> Isn't this premature? For ...
> 
> > @@ -57,6 +58,8 @@ void __init noreturn start_xen(unsigned long
> > bootcpu_id,
> >  if ( !boot_fdt_info(device_tree_flattened, dtb_addr) )
> >  BUG();
> >  
> > +    boot_stack_chk_guard_setup();
> 
> ... this function's use of get_random(), either arch_get_random()
> needs
> to be implemented, or (as Julien also pointed out for Arm64) NOW()
> needs
> to work. Yet get_s_time() presently expands to just BUG_ON(). Given
> this
> it's not even clear to me how Oleksii managed to actually test this.
Okay, I think I found what is the problem and why my testing on staging
wasn't really correct.

In xen/include/xen/stack_protector.h (
https://patchew.org/Xen/20241130010954.36057-1-volodymyr._5fbabc...@epam.com/20241130010954.36057-3-volodymyr._5fbabc...@epam.com/
) CONFIG_STACKPROTECTOR is used for ifdef-ing so the stub version of 
boot_stack_chk_guard_setup() is used and there is no need for
get_random() implementation:
1. Shouldn't it be /s/CONFIG_STACKPROTECTOR/CONFIG_STACK_PROTECTOR ?
2. CONFIG_STACK_PROTECTOR isn't set in case of RISC-V, at least:
  grep -Rni "STACK_PROTECTOR" xen/.config 
  38:CONFIG_HAS_STACK_PROTECTOR=y
  76:# CONFIG_STACK_PROTECTOR is not set
   
   Shouldn't it be default HAS_STACK_PROTECTOR ( or something similar )
   for 'config STACK_PROTECTOR':
  diff --git a/xen/common/Kconfig b/xen/common/Kconfig
  index 0ffd825510..f3156dbb9a 100644
  --- a/xen/common/Kconfig
  +++ b/xen/common/Kconfig
  @@ -521,6 +521,7 @@ config TRACEBUFFER
   
   config STACK_PROTECTOR
  bool "Stack protection"
  +   default HAS_STACK_PROTECTOR
  depends on HAS_STACK_PROTECTOR
  help
Use compiler's option -fstack-protector (supported both by
  GCC
  
With these changes, I can confirm Jan's statement that the BUG_ON()
occurs due to the absence of the get_random()/NOW() implementation.

My second test (on my downstream branch) wasn't relevant because
get_s_time() and NOW() are implemented there.


Aside from the changes I mentioned above, I can re-send this patch when
I submit the patch enabling get_s_time() and NOW() for RISC-V. If you,
Volodymyr, are okay with that, we can proceed in this way.

Best regards,
 Oleksii



Re: [PATCH 01/15] x86/boot: introduce boot domain

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> To begin moving toward allowing the hypervisor to construct more than one
> domain at boot, a container is needed for a domain's build information.
> Introduce a new header, , that contains the initial
> struct boot_domain that encapsulate the build information for a domain.

Why does this need to be a per-arch header? Wasn't one of the goals to unify
things as much as possible?

> --- a/xen/arch/x86/include/asm/dom0_build.h
> +++ b/xen/arch/x86/include/asm/dom0_build.h
> @@ -13,9 +13,9 @@ unsigned long dom0_compute_nr_pages(struct domain *d,
>  unsigned long initrd_len);
>  int dom0_setup_permissions(struct domain *d);
>  
> -struct boot_info;
> -int dom0_construct_pv(struct boot_info *bi, struct domain *d);
> -int dom0_construct_pvh(struct boot_info *bi, struct domain *d);
> +struct boot_domain;
> +int dom0_construct_pv(struct boot_domain *bd);
> +int dom0_construct_pvh(struct boot_domain *bd);

I'm wondering: Just a few commits ago you moved these to boot_info. Now you
move them to boot_domain. Why the extra churn, and what further transformations
are to be expected?

Jan



Re: [PATCH v7 04/10] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves

2024-12-02 Thread Jan Beulich
On 21.10.2024 17:45, Alejandro Vallejo wrote:
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -224,7 +224,7 @@ static void apic_setup(void)
>  
>  /* 8259A ExtInts are delivered through IOAPIC pin 0 (Virtual Wire Mode). 
> */
>  ioapic_write(0x10, APIC_DM_EXTINT);
> -ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
> +ioapic_write(0x11, SET_APIC_ID(cpu_to_x2apicid[0]));
>  }

In uses like this or ...

> --- a/tools/firmware/hvmloader/mp_tables.c
> +++ b/tools/firmware/hvmloader/mp_tables.c
> @@ -198,8 +198,10 @@ static void fill_mp_config_table(struct mp_config_table 
> *mpct, int length)
>  /* fills in an MP processor entry for VCPU 'vcpu_id' */
>  static void fill_mp_proc_entry(struct mp_proc_entry *mppe, int vcpu_id)
>  {
> +ASSERT(cpu_to_x2apicid[vcpu_id] < 0xFF );
> +
>  mppe->type = ENTRY_TYPE_PROCESSOR;
> -mppe->lapic_id = LAPIC_ID(vcpu_id);
> +mppe->lapic_id = cpu_to_x2apicid[vcpu_id];

... this one (and also in acpi_lapic_id()), I consider the "x2" in the name
actively confusing, despite ...

> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -29,7 +29,37 @@
>  
>  #include 
>  
> -static int ap_callin;
> +/**
> + * Lookup table of (x2)APIC IDs.
> + *
> + * Each entry is populated its respective CPU as they come online. This is 
> required
> + * for generating the MADT with minimal assumptions about ID relationships.
> + *
> + * While the name makes "x2" explicit, these may actually be xAPIC IDs if no
> + * x2APIC is present. "x2" merely highlights that each entry is 32 bits wide.
> + */
> +uint32_t cpu_to_x2apicid[HVM_MAX_VCPUS];

... the commentary here.

> +/** Tristate about x2apic being supported. -1=unknown */
> +static int has_x2apic = -1;

Why is this a tristate? Prior to the variable having been set, ...

> +static uint32_t read_apic_id(void)
> +{
> +uint32_t apic_id;
> +
> +if ( has_x2apic )
> +cpuid(0xb, NULL, NULL, NULL, &apic_id);

... this is bogus anyway.

Jan



Re: [PATCH v7 05/10] tools/libacpi: Use LUT of APIC IDs rather than function pointer

2024-12-02 Thread Jan Beulich
On 21.10.2024 17:45, Alejandro Vallejo wrote:
> Refactors libacpi so that a single LUT is the authoritative source of
> truth for the CPU to APIC ID mappings. This has a know-on effect in
> reducing complexity on future patches, as the same LUT can be used for
> configuring the APICs and configuring the ACPI tables for PVH.
> 
> Not functional change intended, because the same mappings are preserved.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v7:
>   * NOTE: didn't add assert to libacpi as initially accepted in order to
> protect libvirt from an assert failure.

If such an assertion can trigger, doesn't that suggest a problem with the
corresponding caller? I.e. isn't omitting the assertion merely trading a
noisy failure for a silent (and possibly hard to understand) one?

Jan



[XEN PATCH v2] x86/hvm: Use constants for x86 modes

2024-12-02 Thread Teddy Astie
In many places of x86 HVM code, constants integer are used to indicate in what 
mode is
running the CPU (real, vm86, 16-bits, 32-bits, 64-bits). However, these 
constants are
are written directly as integer which hides the actual meaning of these modes.

This patch introduces X86_MODE_* macros and replace those occurences with it.

Signed-off-by Teddy Astie 
---
v2:
Formatting changes (alignment, ...)
Renamed v86 to vm86. (Jan)
---
 xen/arch/x86/hvm/emulate.c   | 18 ++
 xen/arch/x86/hvm/hypercall.c | 13 +++--
 xen/arch/x86/hvm/viridian/viridian.c |  9 +
 xen/arch/x86/hvm/vmx/vmx.c   |  9 +
 xen/arch/x86/hvm/vmx/vvmx.c  |  5 +++--
 xen/arch/x86/include/asm/hvm/hvm.h   |  6 ++
 6 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f2bc6967df..9721fd8d4d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2433,14 +2433,15 @@ static void cf_check hvmemul_put_fpu(
 
 switch ( mode )
 {
-case 8:
+case X86_MODE_64BIT:
 fpu_ctxt->fip.addr = aux->ip;
 if ( dval )
 fpu_ctxt->fdp.addr = aux->dp;
 fpu_ctxt->x[FPU_WORD_SIZE_OFFSET] = 8;
 break;
 
-case 4: case 2:
+case X86_MODE_32BIT:
+case X86_MODE_16BIT:
 fpu_ctxt->fip.offs = aux->ip;
 fpu_ctxt->fip.sel  = aux->cs;
 if ( dval )
@@ -2451,7 +2452,8 @@ static void cf_check hvmemul_put_fpu(
 fpu_ctxt->x[FPU_WORD_SIZE_OFFSET] = mode;
 break;
 
-case 0: case 1:
+case X86_MODE_REAL:
+case X86_MODE_VM86:
 fpu_ctxt->fip.addr = aux->ip | (aux->cs << 4);
 if ( dval )
 fpu_ctxt->fdp.addr = aux->dp | (aux->ds << 4);
@@ -2952,11 +2954,11 @@ static const char *guest_x86_mode_to_str(int mode)
 {
 switch ( mode )
 {
-case 0:  return "Real";
-case 1:  return "v86";
-case 2:  return "16bit";
-case 4:  return "32bit";
-case 8:  return "64bit";
+case X86_MODE_REAL:   return "Real";
+case X86_MODE_VM86:   return "v86";
+case X86_MODE_16BIT:  return "16bit";
+case X86_MODE_32BIT:  return "32bit";
+case X86_MODE_64BIT:  return "64bit";
 default: return "Unknown";
 }
 }
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 81883c8d4f..dd5b017fd1 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -112,23 +113,23 @@ int hvm_hypercall(struct cpu_user_regs *regs)
 
 switch ( mode )
 {
-case 8:
+case X86_MODE_64BIT:
 eax = regs->rax;
 fallthrough;
-case 4:
-case 2:
+case X86_MODE_32BIT:
+case X86_MODE_16BIT:
 if ( currd->arch.monitor.guest_request_userspace_enabled &&
 eax == __HYPERVISOR_hvm_op &&
-(mode == 8 ? regs->rdi : regs->ebx) == 
HVMOP_guest_request_vm_event )
+(mode == X86_MODE_64BIT ? regs->rdi : regs->ebx) == 
HVMOP_guest_request_vm_event )
 break;
 
 if ( likely(!hvm_get_cpl(curr)) )
 break;
 fallthrough;
-default:
+case X86_MODE_VM86:
 regs->rax = -EPERM;
 return HVM_HCALL_completed;
-case 0:
+case X86_MODE_REAL:
 break;
 }
 
diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 21480d9ee7..0e3b824bf0 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -933,13 +934,13 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 
 switch ( mode )
 {
-case 8:
+case X86_MODE_64BIT:
 input.raw = regs->rcx;
 input_params_gpa = regs->rdx;
 output_params_gpa = regs->r8;
 break;
 
-case 4:
+case X86_MODE_32BIT:
 input.raw = (regs->rdx << 32) | regs->eax;
 input_params_gpa = (regs->rbx << 32) | regs->ecx;
 output_params_gpa = (regs->rdi << 32) | regs->esi;
@@ -1038,11 +1039,11 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 
 switch ( mode )
 {
-case 8:
+case X86_MODE_64BIT:
 regs->rax = output.raw;
 break;
 
-case 4:
+case X86_MODE_32BIT:
 regs->rdx = output.raw >> 32;
 regs->rax = (uint32_t)output.raw;
 break;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b6885d0e27..eee1d4b47a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -886,14 +886,15 @@ int cf_check vmx_guest_x86_mode(struct vcpu *v)
 unsigned long cs_ar_bytes;
 
 if ( unlikely(!(v->arch.hvm.guest_cr[0] & X86_CR0_PE)) )
-return 0;
+return X86_MODE_REAL;
 if ( unli

Re: [PATCH 03/15] x86/boot: add cmdline to struct boot_domain

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -644,9 +644,11 @@ static bool __init check_and_adjust_load_address(
>  }
>  
>  static int __init pvh_load_kernel(
> -struct domain *d, struct boot_module *image, struct boot_module *initrd,
> -paddr_t *entry, paddr_t *start_info_addr)
> +struct boot_domain *bd, paddr_t *entry, paddr_t *start_info_addr)
>  {
> +struct domain *d = bd->d;
> +struct boot_module *image = bd->kernel;
> +struct boot_module *initrd = bd->ramdisk;
>  void *image_base = bootstrap_map_bm(image);
>  void *image_start = image_base + image->headroom;
>  unsigned long image_len = image->size;
> @@ -1304,14 +1306,12 @@ static void __hwdom_init pvh_setup_mmcfg(struct 
> domain *d)
>  int __init dom0_construct_pvh(struct boot_domain *bd)
>  {
>  paddr_t entry, start_info;
> -struct boot_module *image = bd->kernel;
> -struct boot_module *initrd = bd->ramdisk;
>  struct domain *d = bd->d;
>  int rc;
>  
>  printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", d->domain_id);
>  
> -if ( image == NULL )
> +if ( bd->kernel == NULL )
>  panic("Missing kernel boot module for %pd construction\n", d);
>  
>  if ( is_hardware_domain(d) )
> @@ -1351,7 +1351,7 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
>  return rc;
>  }
>  
> -rc = pvh_load_kernel(d, image, initrd, &entry, &start_info);
> +rc = pvh_load_kernel(bd, &entry, &start_info);
>  if ( rc )
>  {
>  printk("Failed to load Dom0 kernel\n");

None of this looks command line related - do these changes rather belong into
patch 1?

Jan



Re: [PATCH 11/15] x86/hyperlaunch: add domain id parsing to domain config

2024-12-02 Thread Jan Beulich
On 26.11.2024 00:45, Jason Andryuk wrote:
> On 2024-11-23 13:20, Daniel P. Smith wrote:
>> @@ -186,6 +209,12 @@ static int __init process_domain_node(
>>   return -EFAULT;
>>   }
>>   
>> +if ( bd->domid == DOMID_INVALID )
>> +bd->domid = get_initial_domain_id();
>> +else
>> +if ( bd->domid != get_initial_domain_id() )
> 
> single line "else if"?

Yes.

>> +printk(XENLOG_WARNING "WARN: unsuported booting not using 
>> initial domid\n");
> 
> "unsupported"
> 
> Maybe "Booting without initial domid not supported"?

Plus the line then wants splitting after XENLOG_WARNING.

Jan



Re: [PATCH 12/15] x86/hyperlaunch: specify dom0 mode with device tree

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/domain_builder/fdt.c
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -141,6 +141,25 @@ static int __init process_domain_node(
>  bd->domid = (domid_t)val;
>  printk("  domid: %d\n", bd->domid);
>  }
> +if ( match_fdt_property(fdt, prop, "mode" ) )
> +{
> +if ( fdt_prop_as_u32(prop, &bd->mode) != 0 )
> +{
> +printk("  failed processing mode for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +
> +printk("  mode: ");
> +if ( !(bd->mode & BUILD_MODE_PARAVIRT) ) {

Nit: Brace placement.

> +if ( bd->mode & BUILD_MODE_ENABLE_DM )
> +printk("HVM\n");
> +else
> +printk("PVH\n");
> +}
> +else
> +printk("PV\n");
> +}
>  }
>  
>  fdt_for_each_subnode(node, fdt, dom_node)
> --- a/xen/arch/x86/include/asm/bootdomain.h
> +++ b/xen/arch/x86/include/asm/bootdomain.h
> @@ -18,6 +18,12 @@ struct boot_domain {
>  
>  domid_t domid;
>  
> +  /* On | Off*/
> +#define BUILD_MODE_PARAVIRT  (1 << 0) /* PV | PVH/HVM */
> +#define BUILD_MODE_ENABLE_DM (1 << 1) /* HVM| PVH */
> +#define BUILD_MODE_LONG  (1 << 2) /* 64 BIT | 32 BIT  */

This last one isn't used anywhere, is it?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1006,7 +1006,8 @@ static struct domain *__init create_dom0(struct 
> boot_info *bi)
>  struct boot_domain *bd = &bi->domains[0];
>  struct domain *d;
>  
> -if ( opt_dom0_pvh )
> +if ( opt_dom0_pvh ||
> + (bi->hyperlaunch_enabled && !(bd->mode & BUILD_MODE_PARAVIRT)) )
>  {
>  dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
> ((hvm_hap_supported() && !opt_dom0_shadow) ?

What about BUILD_MODE_ENABLE_DM?

Jan



Re: [PATCH 11/15] x86/hyperlaunch: add domain id parsing to domain config

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/domain_builder/fdt.h
> +++ b/xen/arch/x86/domain_builder/fdt.h
> @@ -27,6 +27,24 @@ static inline int __init fdt_cell_as_u64(const fdt32_t 
> *cell, uint64_t *val)
>  return 0;
>  }
>  
> +static inline int __init fdt_prop_as_u32(
> +const struct fdt_property *prop, uint32_t *val)
> +{
> +if ( !prop || fdt32_to_cpu(prop->len) < sizeof(u32) )
> +return -EINVAL;
> +
> +return fdt_cell_as_u32((fdt32_t *)prop->data, val);
> +}
> +
> +static inline bool __init match_fdt_property(
> +const void *fdt, const struct fdt_property *prop, const char *s)
> +{
> +int slen, len = strlen(s);

Plain int isn't quite appropriate for strlen()'s return. It doesn't strictly
need to be size_t, but it should be at least unsigned int.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1020,7 +1020,8 @@ static struct domain *__init create_dom0(struct 
> boot_info *bi)
>  dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
>  
>  /* Create initial domain.  Not d0 for pvshim. */
> -bd->domid = get_initial_domain_id();
> +if ( bd->domid == DOMID_INVALID )
> +bd->domid = get_initial_domain_id();

Looks like the comment then wants to move, too.

Jan



Re: [PATCH 12/15] x86/hyperlaunch: specify dom0 mode with device tree

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> --- a/xen/arch/x86/domain_builder/fdt.c
> +++ b/xen/arch/x86/domain_builder/fdt.c
> @@ -141,6 +141,25 @@ static int __init process_domain_node(
>  bd->domid = (domid_t)val;
>  printk("  domid: %d\n", bd->domid);
>  }
> +if ( match_fdt_property(fdt, prop, "mode" ) )
> +{
> +if ( fdt_prop_as_u32(prop, &bd->mode) != 0 )
> +{
> +printk("  failed processing mode for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +
> +printk("  mode: ");
> +if ( !(bd->mode & BUILD_MODE_PARAVIRT) ) {
> +if ( bd->mode & BUILD_MODE_ENABLE_DM )
> +printk("HVM\n");
> +else
> +printk("PVH\n");
> +}
> +else
> +printk("PV\n");

Oh, and: What about BUILD_MODE_ENABLE_DM also being set here?

Jan



Re: [PATCH 13/15] x86/hyperlaunch: add memory parsing to domain config

2024-12-02 Thread Jan Beulich
On 23.11.2024 19:20, Daniel P. Smith wrote:
> @@ -160,6 +161,42 @@ static int __init process_domain_node(
>  else
>  printk("PV\n");
>  }
> +if ( match_fdt_property(fdt, prop, "memory" ) )
> +{
> +uint64_t kb;
> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
> +{
> +printk("  failed processing memory for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +bd->mem_pages = PFN_DOWN(kb * SZ_1K);
> +printk("  memory: %ld\n", bd->mem_pages << PAGE_SHIFT);
> +}
> +if ( match_fdt_property(fdt, prop, "mem-min" ) )
> +{
> +uint64_t kb;
> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
> +{
> +printk("  failed processing memory for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +bd->min_pages = PFN_DOWN(kb * SZ_1K);
> +printk("  min memory: %ld\n", bd->min_pages << PAGE_SHIFT);
> +}
> +if ( match_fdt_property(fdt, prop, "mem-max" ) )
> +{
> +uint64_t kb;
> +if ( fdt_prop_as_u64(prop, &kb) != 0 )
> +{
> +printk("  failed processing memory for domain %s\n",
> +   name == NULL ? "unknown" : name);
> +return -EINVAL;
> +}
> +bd->max_pages = PFN_DOWN(kb * SZ_1K);
> +printk("  max memory: %ld\n", bd->max_pages << PAGE_SHIFT);
> +}

Since the values logged are all multiples of 1k, why make reading the logs
more complicated by logging byte-granular values? I instead wonder whether
converting to more coarse grained values (leaving, say, between 4 and 6
significant digits while using kb, Mb, Gb, etc) wouldn't be yet better.

Jan



[PATCH v11 10/12] xen/arm: add Xen cache colors command line parameter

2024-12-02 Thread Carlo Nonato
From: Luca Miccio 

Add a new command line parameter to configure Xen cache colors.
These colors are dumped together with other coloring info.

Benchmarking the VM interrupt response time provides an estimation of
LLC usage by Xen's most latency-critical runtime task. Results on Arm
Cortex-A53 on Xilinx Zynq UltraScale+ XCZU9EG show that one color, which
reserves 64 KiB of L2, is enough to attain best responsiveness:
- Xen 1 color latency:  3.1 us
- Xen 2 color latency:  3.1 us

Since this is the most common target for Arm cache coloring, the default
amount of Xen colors is set to one.

More colors are instead very likely to be needed on processors whose L1
cache is physically-indexed and physically-tagged, such as Cortex-A57.
In such cases, coloring applies to L1 also, and there typically are two
distinct L1-colors. Therefore, reserving only one color for Xen would
senselessly partitions a cache memory that is already private, i.e.
underutilize it.

Signed-off-by: Luca Miccio 
Signed-off-by: Marco Solieri 
Signed-off-by: Carlo Nonato 
Reviewed-by: Jan Beulich 
---
v11:
- no changes
v10:
- no changes
v9:
- no changes
v8:
- added bound check on xen_colors in llc_coloring_init()
v7:
- removed XEN_DEFAULT_COLOR
- XEN_DEFAULT_NUM_COLORS is now used in a for loop to set xen default colors
---
 docs/misc/cache-coloring.rst  |  2 ++
 docs/misc/xen-command-line.pandoc | 10 ++
 xen/common/llc-coloring.c | 29 +
 3 files changed, 41 insertions(+)

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
index 5224b27afe..e156062aa2 100644
--- a/docs/misc/cache-coloring.rst
+++ b/docs/misc/cache-coloring.rst
@@ -115,6 +115,8 @@ Specific documentation is available at 
`docs/misc/xen-command-line.pandoc`.
 +--+---+
 | ``buddy-alloc-size`` | Buddy allocator reserved size |
 +--+---+
+| ``xen-llc-colors``   | Xen color configuration   |
++--+---+
 
 Colors selection format
 ***
diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 3a70c49c05..992e1f993e 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2923,6 +2923,16 @@ mode.
 **WARNING: `x2apic_phys` is deprecated and superseded by `x2apic-mode`.
 The latter takes precedence if both are set.**
 
+### xen-llc-colors (arm64)
+> `= List of [  | - ]`
+
+> Default: `0: the lowermost color`
+
+Specify Xen LLC color configuration. This options is available only when
+`CONFIG_LLC_COLORING` is enabled.
+Two colors are most likely needed on platforms where private caches are
+physically indexed, e.g. the L1 instruction cache of the Arm Cortex-A57.
+
 ### xenheap_megabytes (arm32)
 > `= `
 
diff --git a/xen/common/llc-coloring.c b/xen/common/llc-coloring.c
index 0f22a9b72c..2e7c0f505d 100644
--- a/xen/common/llc-coloring.c
+++ b/xen/common/llc-coloring.c
@@ -11,6 +11,7 @@
 #include 
 
 #define NR_LLC_COLORS  (1U << CONFIG_LLC_COLORS_ORDER)
+#define XEN_DEFAULT_NUM_COLORS 1
 
 /*
  * -1: not specified (disabled unless llc-size and llc-nr-ways present)
@@ -32,6 +33,9 @@ static unsigned int __ro_after_init 
default_colors[NR_LLC_COLORS];
 static unsigned int __initdata dom0_colors[NR_LLC_COLORS];
 static unsigned int __initdata dom0_num_colors;
 
+static unsigned int __ro_after_init xen_colors[NR_LLC_COLORS];
+static unsigned int __ro_after_init xen_num_colors;
+
 #define mfn_color_mask  (max_nr_colors - 1)
 #define mfn_to_color(mfn)   (mfn_x(mfn) & mfn_color_mask)
 
@@ -90,6 +94,13 @@ static int __init parse_dom0_colors(const char *s)
 }
 custom_param("dom0-llc-colors", parse_dom0_colors);
 
+static int __init parse_xen_colors(const char *s)
+{
+return parse_color_config(s, xen_colors, ARRAY_SIZE(xen_colors),
+  &xen_num_colors);
+}
+custom_param("xen-llc-colors", parse_xen_colors);
+
 static void print_colors(const unsigned int colors[], unsigned int num_colors)
 {
 unsigned int i;
@@ -173,6 +184,22 @@ void __init llc_coloring_init(void)
 for ( i = 0; i < max_nr_colors; i++ )
 default_colors[i] = i;
 
+if ( !xen_num_colors )
+{
+unsigned int i;
+
+xen_num_colors = MIN(XEN_DEFAULT_NUM_COLORS, max_nr_colors);
+
+printk(XENLOG_WARNING
+   "Xen LLC color config not found. Using first %u colors\n",
+   xen_num_colors);
+for ( i = 0; i < xen_num_colors; i++ )
+xen_colors[i] = i;
+}
+else if ( xen_num_colors > max_nr_colors ||
+  !check_colors(xen_colors, xen_num_colors) )
+panic("Bad LLC color config for Xen\n");
+
 arch_llc_coloring_init();
 }
 
@@ -183,6 +210,8 @@ void dump_llc_coloring_info(void)
 
 printk("LLC coloring info:\n");
 printk("Number of LLC colors supported: %u\n", max_nr_colo

[PATCH v11 12/12] xen/arm: add cache coloring support for Xen image

2024-12-02 Thread Carlo Nonato
Xen image is relocated to a new colored physical space. Some relocation
functionalities must be brought back:
- the virtual address of the new space is taken from 0c18fb76323b
  ("xen/arm: Remove unused BOOT_RELOC_VIRT_START").
- relocate_xen() and get_xen_paddr() are taken from f60658c6ae47
  ("xen/arm: Stop relocating Xen").

setup_pagetables() must be adapted for coloring and for relocation. Runtime
page tables are used to map the colored space, but they are also linked in
boot tables so that the new space is temporarily available for relocation.
This implies that Xen protection must happen after the copy.

Finally, since the alternative framework needs to remap the Xen text and
inittext sections, this operation must be done in a coloring-aware way.
The function xen_remap_colored() is introduced for that.

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Reviewed-by: Jan Beulich  # common
---
v11:
- else if -> if in xen_colored_mfn()
v10:
- no changes
v9:
- patch adapted to changes to setup_pagetables()
v8:
- moved xen_colored_map_size() to arm/llc-coloring.c
v7:
- added BUG_ON() checks to arch_llc_coloring_init() and
  create_llc_coloring_mappings()
v6:
- squashed with BOOT_RELOC_VIRT_START patch
- consider_modules() moved in another patch
- removed psci and smpboot code because of new idmap work already handles that
- moved xen_remap_colored() in alternative.c since it's only used there
- removed xen_colored_temp[] in favor of xen_xenmap[] usage for mapping
- use of boot_module_find_by_kind() to remove the need of extra parameter in
  setup_pagetables()
- moved get_xen_paddr() in arm/llc-coloring.c since it's only used there
v5:
- FIXME: consider_modules copy pasted since it got moved
v4:
- removed set_value_for_secondary() because it was wrongly cleaning cache
- relocate_xen() now calls switch_ttbr_id()
---
 xen/arch/arm/alternative.c| 30 +++-
 xen/arch/arm/arm64/mmu/head.S | 58 +++-
 xen/arch/arm/arm64/mmu/mm.c   | 28 ++--
 xen/arch/arm/include/asm/mmu/layout.h |  3 +
 xen/arch/arm/llc-coloring.c   | 63 +
 xen/arch/arm/mmu/setup.c  | 99 +++
 xen/arch/arm/setup.c  | 10 ++-
 xen/common/llc-coloring.c | 18 +
 xen/include/xen/llc-coloring.h| 14 
 9 files changed, 302 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index d99b507093..0fcf4e451d 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,6 +192,27 @@ static int __apply_alternatives_multi_stop(void *xenmap)
 return 0;
 }
 
+static void __init *xen_remap_colored(mfn_t xen_mfn, paddr_t xen_size)
+{
+unsigned int i;
+void *xenmap;
+mfn_t *xen_colored_mfns, mfn;
+
+xen_colored_mfns = xmalloc_array(mfn_t, xen_size >> PAGE_SHIFT);
+if ( !xen_colored_mfns )
+panic("Can't allocate LLC colored MFNs\n");
+
+for_each_xen_colored_mfn ( xen_mfn, mfn, i )
+{
+xen_colored_mfns[i] = mfn;
+}
+
+xenmap = vmap(xen_colored_mfns, xen_size >> PAGE_SHIFT);
+xfree(xen_colored_mfns);
+
+return xenmap;
+}
+
 /*
  * This function should only be called during boot and before CPU0 jump
  * into the idle_loop.
@@ -209,8 +231,12 @@ void __init apply_alternatives_all(void)
  * The text and inittext section are read-only. So re-map Xen to
  * be able to patch the code.
  */
-xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
-VMAP_DEFAULT);
+if ( llc_coloring_enabled )
+xenmap = xen_remap_colored(xen_mfn, xen_size);
+else
+xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
+VMAP_DEFAULT);
+
 /* Re-mapping Xen is not expected to fail during boot. */
 BUG_ON(!xenmap);
 
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index 665a51a337..a1fc9a82f1 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -428,6 +428,61 @@ FUNC_LOCAL(fail)
 b 1b
 END(fail)
 
+/*
+ * Copy Xen to new location and switch TTBR
+ * x0ttbr
+ * x1source address
+ * x2destination address
+ * x3length
+ *
+ * Source and destination must be word aligned, length is rounded up
+ * to a 16 byte boundary.
+ *
+ * MUST BE VERY CAREFUL when saving things to RAM over the copy
+ */
+ENTRY(relocate_xen)
+/*
+ * Copy 16 bytes at a time using:
+ *   x9: counter
+ *   x10: data
+ *   x11: data
+ *   x12: source
+ *   x13: destination
+ */
+mov x9, x3
+mov x12, x1
+mov x13, x2
+
+1:  ldp x10, x11, [x12], #16
+stp x10, x11, [x13], #16
+
+subsx9, x9, #16
+bgt 1b
+
+/*
+ * Flush d

[PATCH v11 04/12] xen/arm: add Dom0 cache coloring support

2024-12-02 Thread Carlo Nonato
Add a command line parameter to allow the user to set the coloring
configuration for Dom0.
A common configuration syntax for cache colors is introduced and
documented.
Take the opportunity to also add:
 - default configuration notion.
 - function to check well-formed configurations.

Direct mapping Dom0 isn't possible when coloring is enabled, so
CDF_directmap flag is removed when creating it.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Reviewed-by: Jan Beulich 
Reviewed-by: Michal Orzel 
---
v11:
- minor changes
v10:
- fixed array type for colors parameter in check_colors()
v9:
- moved domain_llc_coloring_free() in next patch cause it's never used for dom0
v8:
- added bound check on dom0_num_colors
- default colors array set just once
v7:
- parse_color_config() doesn't accept leading/trailing commas anymore
- removed alloc_colors() helper
v6:
- moved domain_llc_coloring_free() in this patch
- removed domain_alloc_colors() in favor of a more explicit allocation
- parse_color_config() now accepts the size of the array to be filled
- allocate_memory() moved in another patch
v5:
- Carlo Nonato as the new author
- moved dom0 colors parsing (parse_colors()) in this patch
- added dom0_set_llc_colors() to set dom0 colors after creation
- moved color allocation and checking in this patch
- error handling when allocating color arrays
- FIXME: copy pasted allocate_memory() cause it got moved
v4:
- dom0 colors are dynamically allocated as for any other domain
  (colors are duplicated in dom0_colors and in the new array, but logic
  is simpler)
---
 docs/misc/cache-coloring.rst  |  29 
 docs/misc/xen-command-line.pandoc |   9 +++
 xen/arch/arm/domain_build.c   |  10 ++-
 xen/common/llc-coloring.c | 120 +-
 xen/include/xen/llc-coloring.h|   1 +
 5 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
index 12972dbb2c..7b47d0ed92 100644
--- a/docs/misc/cache-coloring.rst
+++ b/docs/misc/cache-coloring.rst
@@ -107,6 +107,35 @@ Specific documentation is available at 
`docs/misc/xen-command-line.pandoc`.
 +--+---+
 | ``llc-nr-ways``  | Set the LLC number of ways|
 +--+---+
+| ``dom0-llc-colors``  | Dom0 color configuration  |
++--+---+
+
+Colors selection format
+***
+
+Regardless of the memory pool that has to be colored (Xen, Dom0/DomUs),
+the color selection can be expressed using the same syntax. In particular a
+comma-separated list of colors or ranges of colors is used.
+Ranges are hyphen-separated intervals (such as `0-4`) and are inclusive on both
+sides.
+
+Note that:
+
+- no spaces are allowed between values.
+- no overlapping ranges or duplicated colors are allowed.
+- values must be written in ascending order.
+
+Examples:
+
++---+-+
+| **Configuration** | **Actual selection**|
++---+-+
+| 1-2,5-8   | [1, 2, 5, 6, 7, 8]  |
++---+-+
+| 4-8,10,11,12  | [4, 5, 6, 7, 8, 10, 11, 12] |
++---+-+
+| 0 | [0] |
++---+-+
 
 Auto-probing of LLC specs
 #
diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index abd8dae96f..bfdc8b0002 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -963,6 +963,15 @@ Controls for the dom0 IOMMU setup.
 
 Specify a list of IO ports to be excluded from dom0 access.
 
+### dom0-llc-colors (arm64)
+> `= List of [  | - ]`
+
+> Default: `All available LLC colors`
+
+Specify dom0 LLC color configuration. This option is available only when
+`CONFIG_LLC_COLORING` is enabled. If the parameter is not set, all available
+colors are used.
+
 ### dom0_max_vcpus
 
 Either:
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2b8cba9b2f..83d7585e7e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2285,6 +2286,7 @@ void __init create_dom0(void)
 .max_maptrack_frames = -1,
 .grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
 };
+unsigned int flags = CDF_privileged;
 int rc;
 
 /* The vGIC for DOM0 is exactly emulating the hardware GIC */
@@ -2312,10 +2314,16 @@ void __init create_dom0(void)
 panic("SVE vector length error\n");
 }
 
-dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap);
+if ( !llc_coloring_enabled )
+flags |= CDF_directmap;
+
+  

[PATCH v11 02/12] xen/arm: add initial support for LLC coloring on arm64

2024-12-02 Thread Carlo Nonato
LLC coloring needs to know the last level cache layout in order to make the
best use of it. This can be probed by inspecting the CLIDR_EL1 register,
so the Last Level is defined as the last level visible by this register.
Note that this excludes system caches in some platforms.

Static memory allocation and cache coloring are incompatible because static
memory can't be guaranteed to use only colors assigned to the domain.
Panic during DomUs creation when both are enabled.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Reviewed-by: Michal Orzel 
Acked-by: Jan Beulich 
---
v11:
- removed useless #define from processor.h
v10:
- moved CONFIG_NUMA check in arch/arm/Kconfig
v9:
- no changes
v8:
- no changes
v7:
- only minor changes
v6:
- get_llc_way_size() now checks for at least separate I/D caches
v5:
- used - instead of _ for filenames
- moved static-mem check in this patch
- moved dom0 colors parsing in next patch
- moved color allocation and configuration in next patch
- moved check_colors() in next patch
- colors are now printed in short form
v4:
- added "llc-coloring" cmdline option for the boot-time switch
- dom0 colors are now checked during domain init as for any other domain
- fixed processor.h masks bit width
- check for overflow in parse_color_config()
- check_colors() now checks also that colors are sorted and unique
---
 docs/misc/cache-coloring.rst | 14 +
 xen/arch/arm/Kconfig |  1 +
 xen/arch/arm/Makefile|  1 +
 xen/arch/arm/dom0less-build.c|  6 +++
 xen/arch/arm/include/asm/processor.h | 15 ++
 xen/arch/arm/llc-coloring.c  | 79 
 xen/arch/arm/setup.c |  3 ++
 xen/common/llc-coloring.c|  2 +-
 xen/include/xen/llc-coloring.h   |  4 ++
 9 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/llc-coloring.c

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
index 371f21a0e7..12972dbb2c 100644
--- a/docs/misc/cache-coloring.rst
+++ b/docs/misc/cache-coloring.rst
@@ -113,6 +113,20 @@ Auto-probing of LLC specs
 
 LLC size and number of ways are probed automatically by default.
 
+In the Arm implementation, this is done by inspecting the CLIDR_EL1 register.
+This means that other system caches that aren't visible there are ignored.
+
 LLC specs can be manually set via the above command line parameters. This
 bypasses any auto-probing and it's used to overcome failing situations, such as
 flawed probing logic, or for debugging/testing purposes.
+
+Known issues and limitations
+
+
+"xen,static-mem" isn't supported when coloring is enabled
+#
+
+In the domain configuration, "xen,static-mem" allows memory to be statically
+allocated to the domain. This isn't possible when LLC coloring is enabled,
+because that memory can't be guaranteed to use only colors assigned to the
+domain.
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 23bbc91aad..4ec9ef8334 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -8,6 +8,7 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+   select HAS_LLC_COLORING if !NUMA
 
 config ARM
def_bool y
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index e4ad1ce851..ccbfc61f88 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -35,6 +35,7 @@ obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
+obj-$(CONFIG_LLC_COLORING) += llc-coloring.o
 obj-$(CONFIG_MEM_ACCESS) += mem_access.o
 obj-y += mm.o
 obj-y += monitor.o
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index f328a044e9..d93a85434e 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -890,7 +891,12 @@ void __init create_domUs(void)
 panic("No more domain IDs available\n");
 
 if ( dt_find_property(node, "xen,static-mem", NULL) )
+{
+if ( llc_coloring_enabled )
+panic("LLC coloring and static memory are incompatible\n");
+
 flags |= CDF_staticmem;
+}
 
 if ( dt_property_read_bool(node, "direct-map") )
 {
diff --git a/xen/arch/arm/include/asm/processor.h 
b/xen/arch/arm/include/asm/processor.h
index 8e02410465..60b587db69 100644
--- a/xen/arch/arm/include/asm/processor.h
+++ b/xen/arch/arm/include/asm/processor.h
@@ -18,6 +18,21 @@
 #define CTR_IDC_SHIFT   28
 #define CTR_DIC_SHIFT   29
 
+/* CCSIDR Current Cache Size ID Register */
+#define CCSIDR_LINESIZE_MASK_AC(0x7, UL)
+#define CCSIDR_NUMSETS_SHIFT13
+#define CCSIDR_NUMSETS_MASK _AC(0x3fff, UL)
+#define CCSIDR_NUMS

Re: [PATCH v11 08/12] xen/page_alloc: introduce preserved page flags macro

2024-12-02 Thread Carlo Nonato
Hi all,

On Mon, Dec 2, 2024 at 5:59 PM Carlo Nonato
 wrote:
>
> PGC_static and PGC_extra need to be preserved when assigning a page.
> Define a new macro that groups those flags and use it instead of or'ing
> every time.
>
> Signed-off-by: Carlo Nonato 
> ---
> v11:
> - removed PGC_broken from PGC_preserved
> - removed PGC preservation from mark_page_free()
> v10:
> - fixed commit message
> v9:
> - add PGC_broken to PGC_preserved
> - clear PGC_extra in alloc_domheap_pages() only if MEMF_no_refcount is set
> v8:
> - fixed PGC_extra ASSERT fail in alloc_domheap_pages() by removing PGC_extra
>   before freeing
> v7:
> - PGC_preserved used also in mark_page_free()
> v6:
> - preserved_flags renamed to PGC_preserved
> - PGC_preserved is used only in assign_pages()
> v5:
> - new patch
> ---
>  xen/common/page_alloc.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 55d561e93c..e73b404169 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -161,6 +161,7 @@
>  #endif
>
>  #define PGC_no_buddy_merge PGC_static
> +#define PGC_preserved (PGC_extra | PGC_static)
>
>  #ifndef PGT_TYPE_INFO_INITIALIZER
>  #define PGT_TYPE_INFO_INITIALIZER 0
> @@ -1447,8 +1448,7 @@ static bool mark_page_free(struct page_info *pg, mfn_t 
> mfn)
>  break;
>
>  case PGC_state_offlining:
> -pg->count_info = (pg->count_info & PGC_broken) |
> - PGC_state_offlined;
> +pg->count_info = (pg->count_info & PGC_broken) | PGC_state_offlined;
>  pg_offlined = true;
>  break;
>
> @@ -2382,7 +2382,7 @@ int assign_pages(
>
>  for ( i = 0; i < nr; i++ )
>  {
> -ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static)));
> +ASSERT(!(pg[i].count_info & ~PGC_preserved));
>  if ( pg[i].count_info & PGC_extra )
>  extra_pages++;
>  }
> @@ -2442,7 +2442,7 @@ int assign_pages(
>  page_set_owner(&pg[i], d);
>  smp_wmb(); /* Domain pointer must be visible before updating refcnt. 
> */
>  pg[i].count_info =
> -(pg[i].count_info & (PGC_extra | PGC_static)) | PGC_allocated | 
> 1;
> +(pg[i].count_info & PGC_preserved) | PGC_allocated | 1;
>
>  page_list_add_tail(&pg[i], page_to_list(d, &pg[i]));
>  }
> @@ -2501,6 +2501,14 @@ struct page_info *alloc_domheap_pages(
>  }
>  if ( assign_page(pg, order, d, memflags) )
>  {
> +if ( memflags & MEMF_no_refcount )
> +{
> +unsigned long i;
> +
> +for ( i = 0; i < (1UL << order); i++ )
> +pg[i].count_info &= ~PGC_extra;
> +}
> +
>  free_heap_pages(pg, order, memflags & MEMF_no_scrub);
>  return NULL;
>  }
> @@ -2555,6 +2563,7 @@ void free_domheap_pages(struct page_info *pg, unsigned 
> int order)
>  {
>  ASSERT(d->extra_pages);
>  d->extra_pages--;
> +pg[i].count_info &= ~PGC_extra;
>  }
>  }
>
> --
> 2.43.0
>

Sorry guys, this patch is wrong.
Here's the correct one.

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 55d561e93c..1c0990b180 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -161,6 +161,7 @@
 #endif

 #define PGC_no_buddy_merge PGC_static
+#define PGC_preserved (PGC_extra | PGC_static)

 #ifndef PGT_TYPE_INFO_INITIALIZER
 #define PGT_TYPE_INFO_INITIALIZER 0
@@ -2382,7 +2383,7 @@ int assign_pages(

 for ( i = 0; i < nr; i++ )
 {
-ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static)));
+ASSERT(!(pg[i].count_info & ~PGC_preserved));
 if ( pg[i].count_info & PGC_extra )
 extra_pages++;
 }
@@ -2442,7 +2443,7 @@ int assign_pages(
 page_set_owner(&pg[i], d);
 smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
 pg[i].count_info =
-(pg[i].count_info & (PGC_extra | PGC_static)) | PGC_allocated | 1;
+(pg[i].count_info & PGC_preserved) | PGC_allocated | 1;

 page_list_add_tail(&pg[i], page_to_list(d, &pg[i]));
 }

Sorry about that.
Thanks.



Re: [PATCH] xen/arm32: Get rid of __memzero()

2024-12-02 Thread Julien Grall

Hi Jan,

On 27/11/2024 11:09, Jan Beulich wrote:

On 27.11.2024 11:55, Julien Grall wrote:

From: Julien Grall 

All the code in arch/arm32/lib/ where copied from Linux 3.16
and never re-synced since then.

A few years ago, Linux got rid of __memzero() because the implementation
is very similar to memset(p,0,n) and the current use of __memzero()
interferes with optimization. See full commit message from Linux below.

So it makes sense to get rid of __memzero in Xen as well.

 From ff5fdafc9e9702846480e0cea55ba861f72140a2 Mon Sep 17 00:00:00 2001
 From: Nicolas Pitre 
 Date: Fri, 19 Jan 2018 18:17:46 +0100
 Subject: [PATCH] ARM: 8745/1: get rid of __memzero()

 The __memzero assembly code is almost identical to memset's except for
 two orr instructions. The runtime performance of __memset(p, n) and
 memset(p, 0, n) is accordingly almost identical.

 However, the memset() macro used to guard against a zero length and to
 call __memzero at compile time when the fill value is a constant zero
 interferes with compiler optimizations.

 Arnd found tha the test against a zero length brings up some new
 warnings with gcc v8:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103

 And successively rremoving the test against a zero length and the call
 to __memzero optimization produces the following kernel sizes for
 defconfig with gcc 6:

 text data bss   dec   hex  filename
 12248142  6278960  413588  18940690   1210312  vmlinux.orig
 12244474  6278960  413588  18937022   120f4be  vmlinux.no_zero_test
 12239160  6278960  413588  18931708   120dffc  vmlinux.no_memzero

 So it is probably not worth keeping __memzero around given that the
 compiler can do a better job at inlining trivial memset(p,0,n) on its
 own. And the memset code already handles a zero length just fine.

 Suggested-by: Arnd Bergmann 
 Signed-off-by: Nicolas Pitre 
 Acked-by: Ard Biesheuvel 
 Acked-by: Arnd Bergmann 
 Signed-off-by: Russell King 

Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
ff5fdafc9e97
Signed-off-by: Julien Grall 


Reviewed-by: Jan Beulich 


Thanks!


with a suggestion:


--- a/xen/arch/arm/README.LinuxPrimitives
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -108,10 +108,9 @@ linux/arch/arm/lib/memchr.S 
xen/arch/arm/arm32/lib/memchr.S
  linux/arch/arm/lib/memcpy.S xen/arch/arm/arm32/lib/memcpy.S
  linux/arch/arm/lib/memmove.Sxen/arch/arm/arm32/lib/memmove.S
  linux/arch/arm/lib/memset.S xen/arch/arm/arm32/lib/memset.S
-linux/arch/arm/lib/memzero.Sxen/arch/arm/arm32/lib/memzero.S
  
  for i in copy_template.S memchr.S memcpy.S memmove.S memset.S \

- memzero.S ; do
+; do
  diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
  done


Also do away with the line continuation at the same time? E.g.

for i in copy_template.S memchr.S memcpy.S memmove.S memset.S; do
 diff -u linux/arch/arm/lib/$i xen/arch/arm/arm32/lib/$i
done


I will go with this version because I don't expect the number of files 
to increase.


Cheers,

--
Julien Grall




Re: stable-4.18: reliably crash network driver domain by squeezing free_memory

2024-12-02 Thread Marek Marczykowski-Górecki
On Mon, Dec 02, 2024 at 03:54:23PM +, James Dingwall wrote:
> So there is some undefined range for (free_memory - m) to (free_memory - n)
> where it is possible to crash the driver domain depending on the guest
> startup ordering.  My (perhaps naive) reasoning would be that
> free_memory is the resource available to safely assign without having to
> allow for some unknown overhead and if I do ask for too much then I
> get a 'safe' failure.

FWIW Qubes OS tries to always keep at least 50MB free for Xen. So, our
formula is (free_memory - 50MB).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH 06/15] x86/hyperlaunch: introduce the domain builder

2024-12-02 Thread Jason Andryuk

On 2024-12-02 04:55, Jan Beulich wrote:

On 25.11.2024 18:52, Jason Andryuk wrote:

On 2024-11-23 13:20, Daniel P. Smith wrote:

--- /dev/null
+++ b/xen/arch/x86/domain_builder/fdt.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024, Apertus Solutions, LLC
+ */
+#include 
+#include 
+#include 
+#include 
+#include  /* required for asm/setup.h */


Should asm/setup.h just be changed?


Why would it need changing (and why's that #include needed)? It has a
proper forward decl of the struct tag, and I can't see why it would need
anything else.


My question was suggesting to just make the changes, but that was 
already done by Frediano in aa4ad424f0 ("x86/setup: Make setup.h header 
self contained").


I think Dan's comment predates aa4ad424f0.  So it is now stale and the 
whole line should be removed.


Regards,
Jason



Re: [PATCH v6 1/3] xen/arm: mpu: Create boot-time MPU protection regions

2024-12-02 Thread Ayan Kumar Halder



On 30/11/2024 17:45, Julien Grall wrote:

Hi Ayan,

Hi Julien,


On 18/11/2024 12:12, Ayan Kumar Halder wrote:

+/*
+ * Macro to prepare and set a EL2 MPU memory region.
+ * We will also create an according MPU memory region entry, which
+ * is a structure of pr_t,  in table \prmap.
+ *
+ * sel: region selector
+ * base:    reg storing base address
+ * limit:   reg storing limit address
+ * prbar:   store computed PRBAR_EL2 value
+ * prlar:   store computed PRLAR_EL2 value
+ * maxcount:    maximum number of EL2 regions supported
+ * attr_prbar:  PRBAR_EL2-related memory attributes. If not 
specified it will be

+ *  REGION_DATA_PRBAR
+ * attr_prlar:  PRLAR_EL2-related memory attributes. If not 
specified it will be

+ *  REGION_NORMAL_PRLAR
+ *
+ * Preserves \maxcount
+ * Clobbers \sel, \base, \limit, \prbar, \prlar


Per this line, "\sel" is clobbered. Which implies the caller cannot 
rely on the value. Yet ...



+ *
+ * Note that all parameters using registers should be distinct.
+ */
+.macro prepare_xen_region, sel, base, limit, prbar, prlar, maxcount, 
attr_prbar=REGION_DATA_PRBAR, attr_prlar=REGION_NORMAL_PRLAR

+    /* Check if the region is empty */
+    cmp   \base, \limit
+    beq   1f
+
+    /* Check if the number of regions exceeded the count specified 
in MPUIR_EL2 */

+    cmp   \sel, \maxcount
+    bge   fail_insufficient_regions
+
+    /* Prepare value for PRBAR_EL2 reg and preserve it in \prbar.*/
+    and   \base, \base, #MPU_REGION_MASK
+    mov   \prbar, #\attr_prbar
+    orr   \prbar, \prbar, \base
+
+    /* Limit address should be inclusive */
+    sub   \limit, \limit, #1
+    and   \limit, \limit, #MPU_REGION_MASK
+    mov   \prlar, #\attr_prlar
+    orr   \prlar, \prlar, \limit
+
+    msr   PRSELR_EL2, \sel
+    isb
+    msr   PRBAR_EL2, \prbar
+    msr   PRLAR_EL2, \prlar
+    dsb   sy
+    isb
+
+    add   \sel, \sel, #1
+
+1:
+.endm
+
+/*
+ * Failure caused due to insufficient MPU regions.
+ */
+FUNC_LOCAL(fail_insufficient_regions)
+    PRINT("- Selected MPU region is above the implemented number in 
MPUIR_EL2 -\r\n")

+1:  wfe
+    b   1b
+END(fail_insufficient_regions)
+
+/*
+ * Maps the various sections of Xen (described in xen.lds.S) as 
different MPU

+ * regions.
+ *
+ * Clobbers x0 - x5
+ *
+ */
+FUNC(enable_boot_cpu_mm)
+    /* Get the number of regions specified in MPUIR_EL2 */
+    mrs   x5, MPUIR_EL2
+    and   x5, x5, #NUM_MPU_REGIONS_MASK
+
+    /* x0: region sel */
+    mov   x0, xzr
+    /* Xen text section. */
+    ldr   x1, =_stext
+    ldr   x2, =_etext
+    prepare_xen_region x0, x1, x2, x3, x4, x5, 
attr_prbar=REGION_TEXT_PRBAR

+
+    /* Xen read-only data section. */
+    ldr   x1, =_srodata
+    ldr   x2, =_erodata
+    prepare_xen_region x0, x1, x2, x3, x4, x5, 
attr_prbar=REGION_RO_PRBAR



... you will pass x0 (\sel) without any update from the previous call. 
So effectively, you treat \sel as an input/output that will get 
incremented.


Therefore the comment needs to be updated. Possibly to:

"
output:

sel: Next available index in the MPU
"

Yes,  makes sense.


I will commit the series once we agree on the wording.


Yes, agreed with the wording.

- Ayan




Re: [PATCH v10 08/12] xen/page_alloc: introduce preserved page flags macro

2024-12-02 Thread Jan Beulich
On 02.12.2024 13:51, Andrea Bastoni wrote:
> The proposal would be to go back to v9, which reduces (for the PGC)
> the management to colored only:
> 
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> ...
>> @@ -2382,7 +2556,7 @@ int assign_pages(
>>
>>  for ( i = 0; i < nr; i++ )
>>  {
>> -ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static)));
>> +ASSERT(!(pg[i].count_info & ~(PGC_extra | PGC_static | 
>> PGC_colored)));
>>  if ( pg[i].count_info & PGC_extra )
>>  extra_pages++;
>>  }
>> @@ -2442,7 +2616,8 @@ int assign_pages(
>>  page_set_owner(&pg[i], d);
>>  smp_wmb(); /* Domain pointer must be visible before updating 
>> refcnt. */
>>  pg[i].count_info =
>> -(pg[i].count_info & (PGC_extra | PGC_static)) | PGC_allocated | 
>> 1;
>> +(pg[i].count_info & (PGC_extra | PGC_static | PGC_colored)) |
>> +PGC_allocated | 1;
>>
>>  page_list_add_tail(&pg[i], page_to_list(d, &pg[i]));

So what's wrong with having PGC_preserved _just_ for these two instances?

Jan



[PATCH v11 00/12] Arm cache coloring

2024-12-02 Thread Carlo Nonato
Shared caches in multi-core CPU architectures represent a problem for
predictability of memory access latency. This jeopardizes applicability
of many Arm platform in real-time critical and mixed-criticality
scenarios. We introduce support for cache partitioning with page
coloring, a transparent software technique that enables isolation
between domains and Xen, and thus avoids cache interference.

When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
the user to define assignments of cache partitions ids, called colors,
where assigning different colors guarantees no mutual eviction on cache
will ever happen. This instructs the Xen memory allocator to provide
the i-th color assignee only with pages that maps to color i, i.e. that
are indexed in the i-th cache partition.

The proposed implementation supports the dom0less feature.
The proposed implementation doesn't support the static-mem feature.
The solution has been tested in several scenarios, including Xilinx Zynq
MPSoCs.

Carlo Nonato (11):
  xen/common: add cache coloring common code
  xen/arm: add initial support for LLC coloring on arm64
  xen/arm: permit non direct-mapped Dom0 construction
  xen/arm: add Dom0 cache coloring support
  xen: extend domctl interface for cache coloring
  tools: add support for cache coloring configuration
  xen/arm: add support for cache coloring configuration via device-tree
  xen/page_alloc: introduce preserved page flags macro
  xen: add cache coloring allocator for domains
  xen/arm: make consider_modules() available for xen relocation
  xen/arm: add cache coloring support for Xen image

Luca Miccio (1):
  xen/arm: add Xen cache colors command line parameter

 SUPPORT.md  |   7 +
 docs/index.rst  |   1 +
 docs/man/xl.cfg.5.pod.in|   6 +
 docs/misc/arm/device-tree/booting.txt   |   5 +
 docs/misc/cache-coloring.rst| 248 +++
 docs/misc/xen-command-line.pandoc   |  70 +
 tools/golang/xenlight/helpers.gen.go|  16 +
 tools/golang/xenlight/types.gen.go  |   1 +
 tools/include/libxl.h   |   5 +
 tools/include/xenctrl.h |   9 +
 tools/libs/ctrl/xc_domain.c |  34 +++
 tools/libs/light/libxl_create.c |  18 ++
 tools/libs/light/libxl_types.idl|   1 +
 tools/xl/xl_parse.c |  38 ++-
 xen/arch/arm/Kconfig|   1 +
 xen/arch/arm/Makefile   |   1 +
 xen/arch/arm/alternative.c  |  30 +-
 xen/arch/arm/arm32/mmu/mm.c |  95 +-
 xen/arch/arm/arm64/mmu/head.S   |  58 +++-
 xen/arch/arm/arm64/mmu/mm.c |  28 +-
 xen/arch/arm/dom0less-build.c   |  60 +---
 xen/arch/arm/domain_build.c | 107 ++-
 xen/arch/arm/include/asm/domain_build.h |   1 +
 xen/arch/arm/include/asm/mm.h   |   5 +
 xen/arch/arm/include/asm/mmu/layout.h   |   3 +
 xen/arch/arm/include/asm/processor.h|  15 +
 xen/arch/arm/include/asm/setup.h|   3 +
 xen/arch/arm/llc-coloring.c | 142 +
 xen/arch/arm/mmu/setup.c| 193 +++-
 xen/arch/arm/setup.c|  13 +-
 xen/common/Kconfig  |  29 ++
 xen/common/Makefile |   1 +
 xen/common/domain.c |   3 +
 xen/common/domctl.c |  10 +
 xen/common/keyhandler.c |   3 +
 xen/common/llc-coloring.c   | 388 
 xen/common/page_alloc.c | 209 -
 xen/include/public/domctl.h |   9 +
 xen/include/xen/llc-coloring.h  |  64 
 xen/include/xen/sched.h |   5 +
 xen/include/xen/xmalloc.h   |  12 +
 41 files changed, 1775 insertions(+), 172 deletions(-)
 create mode 100644 docs/misc/cache-coloring.rst
 create mode 100644 xen/arch/arm/llc-coloring.c
 create mode 100644 xen/common/llc-coloring.c
 create mode 100644 xen/include/xen/llc-coloring.h

-- 
2.43.0




[PATCH v11 05/12] xen: extend domctl interface for cache coloring

2024-12-02 Thread Carlo Nonato
Add a new domctl hypercall to allow the user to set LLC coloring
configurations. Colors can be set only once, just after domain creation,
since recoloring isn't supported.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
Reviewed-by: Jan Beulich 
---
v11:
- no changes
v10:
- no changes
v9:
- minor printk message changes
- moved domain_llc_coloring_free() in this patch
v8:
- fixed memory leak on error path of domain_set_llc_colors()
v7:
- -EOPNOTSUPP returned in case of hypercall called without llc_coloring_enabled
- domain_set_llc_colors_domctl() renamed to domain_set_llc_colors()
- added padding and input bound checks to domain_set_llc_colors()
- removed alloc_colors() helper usage from domain_set_llc_colors()
v6:
- reverted the XEN_DOMCTL_INTERFACE_VERSION bump
- reverted to uint32 for the guest handle
- explicit padding added to the domctl struct
- rewrote domain_set_llc_colors_domctl() to be more explicit
v5:
- added a new hypercall to set colors
- uint for the guest handle
v4:
- updated XEN_DOMCTL_INTERFACE_VERSION
---
 xen/common/domain.c|  3 ++
 xen/common/domctl.c| 10 +++
 xen/common/llc-coloring.c  | 55 --
 xen/include/public/domctl.h|  9 ++
 xen/include/xen/llc-coloring.h |  5 
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 92263a4fbd..842a23751a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1276,6 +1277,8 @@ void domain_destroy(struct domain *d)
 {
 BUG_ON(!d->is_dying);
 
+domain_llc_coloring_free(d);
+
 /* May be already destroyed, or get_domain() can race us. */
 if ( atomic_cmpxchg(&d->refcnt, 0, DOMAIN_DESTROYED) != 0 )
 return;
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index ea16b75910..6387dddbcd 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -866,6 +867,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 __HYPERVISOR_domctl, "h", u_domctl);
 break;
 
+case XEN_DOMCTL_set_llc_colors:
+if ( op->u.set_llc_colors.pad )
+ret = -EINVAL;
+else if ( llc_coloring_enabled )
+ret = domain_set_llc_colors(d, &op->u.set_llc_colors);
+else
+ret = -EOPNOTSUPP;
+break;
+
 default:
 ret = arch_do_domctl(op, d, u_domctl);
 break;
diff --git a/xen/common/llc-coloring.c b/xen/common/llc-coloring.c
index 8f076849c1..2a0ee695c8 100644
--- a/xen/common/llc-coloring.c
+++ b/xen/common/llc-coloring.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2024, Advanced Micro Devices, Inc.
  * Copyright (C) 2024, Minerva Systems SRL
  */
+#include 
 #include 
 #include 
 #include 
@@ -109,8 +110,7 @@ static void print_colors(const unsigned int colors[], 
unsigned int num_colors)
 printk(" }\n");
 }
 
-static bool __init check_colors(const unsigned int colors[],
-unsigned int num_colors)
+static bool check_colors(const unsigned int colors[], unsigned int num_colors)
 {
 unsigned int i;
 
@@ -191,7 +191,7 @@ void domain_dump_llc_colors(const struct domain *d)
 print_colors(d->llc_colors, d->num_llc_colors);
 }
 
-static void __init domain_set_default_colors(struct domain *d)
+static void domain_set_default_colors(struct domain *d)
 {
 printk(XENLOG_WARNING
"LLC color config not found for %pd, using all colors\n", d);
@@ -228,6 +228,55 @@ int __init dom0_set_llc_colors(struct domain *d)
 return 0;
 }
 
+int domain_set_llc_colors(struct domain *d,
+  const struct xen_domctl_set_llc_colors *config)
+{
+unsigned int *colors;
+
+if ( d->num_llc_colors )
+return -EEXIST;
+
+if ( !config->num_llc_colors )
+{
+domain_set_default_colors(d);
+return 0;
+}
+
+if ( config->num_llc_colors > max_nr_colors )
+return -EINVAL;
+
+colors = xmalloc_array(unsigned int, config->num_llc_colors);
+if ( !colors )
+return -ENOMEM;
+
+if ( copy_from_guest(colors, config->llc_colors, config->num_llc_colors) )
+{
+xfree(colors);
+return -EFAULT;
+}
+
+if ( !check_colors(colors, config->num_llc_colors) )
+{
+printk(XENLOG_ERR "%pd: bad LLC color config\n", d);
+xfree(colors);
+return -EINVAL;
+}
+
+d->llc_colors = colors;
+d->num_llc_colors = config->num_llc_colors;
+
+return 0;
+}
+
+void domain_llc_coloring_free(struct domain *d)
+{
+if ( !llc_coloring_enabled || d->llc_colors == default_colors )
+return;
+
+/* free pointer-to-const using __va(__pa()) */
+xfree(__va(__pa(d->llc_colors)));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xe

Re: [PATCH v2 2/2] tools/xl: add suspend and resume subcommands

2024-12-02 Thread Jason Andryuk

On 2024-11-28 12:19, Anthony PERARD wrote:

On Tue, Nov 26, 2024 at 12:19:41PM -0500, Jason Andryuk wrote:

diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index c45d497c28..3160966972 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -42,6 +42,16 @@ static void unpause_domain(uint32_t domid)
  libxl_domain_unpause(ctx, domid, NULL);
  }
  
+static void suspend_domain(uint32_t domid)

+{
+libxl_domain_suspend_only(ctx, domid, NULL);
+}
+
+static void resume_domain(uint32_t domid)
+{
+libxl_domain_resume(ctx, domid, 1, NULL);
+}
+
  static void destroy_domain(uint32_t domid, int force)
  {
  int rc;
@@ -82,6 +92,32 @@ int main_unpause(int argc, char **argv)
  return EXIT_SUCCESS;
  }
  
+int main_suspend(int argc, char **argv)

+{
+int opt;
+
+SWITCH_FOREACH_OPT(opt, "", NULL, "suspend", 1) {
+/* No options */
+}
+
+suspend_domain(find_domain(argv[optind]));
+
+return EXIT_SUCCESS;
+}
+
+int main_resume(int argc, char **argv)
+{
+int opt;
+
+SWITCH_FOREACH_OPT(opt, "", NULL, "resume", 1) {
+/* No options */
+}
+
+resume_domain(find_domain(argv[optind]));
+
+return EXIT_SUCCESS;
+}


These four new functions in xl_vmcontrol.c needs to be hidden behind
LIBXL_HAVE_NO_SUSPEND_RESUME, like the whole xl_migrate.c file is.
Both prototypes for main_*() are already hidden as well as the new
command in xl_cmdtables.

Or alternatively, we could probably have the command been present on
Arm, but I don't know if libxl_domain_suspend_only() is going to work.
It looks like it only depends on the hypervisor. I can't find any logic
that would treat Arm differently, besides the presence of
LIBXL_HAVE_NO_SUSPEND_RESUME.

But best bet would be to hide those four functions when
LIBXL_HAVE_NO_SUSPEND_RESUME is defined.


Thanks.  Yes, I'll hide them.

Regards,
Jason




Re: [PATCH v2 1/2] xl: Keep monitoring suspended domain

2024-12-02 Thread Jason Andryuk

On 2024-12-02 03:53, Jan Beulich wrote:

On 28.11.2024 17:45, Anthony PERARD wrote:

On Tue, Nov 26, 2024 at 12:19:40PM -0500, Jason Andryuk wrote:

When a VM transitioned to LIBXL_SHUTDOWN_REASON_SUSPEND, the xl daemon
was exiting as 0 = DOMAIN_RESTART_NONE "No domain restart".
Later, when the VM actually shutdown, the missing xl daemon meant the
domain wasn't cleaned up properly.

Add a new DOMAIN_RESTART_SUSPENDED to handle the case.  The xl daemon
keeps running to react to future shutdown events.

The domain death event needs to be re-enabled to catch subsequent
events.  The libxl_evgen_domain_death is moved from death_list to
death_reported, and then it isn't found on subsequent iterations through
death_list.  We enable the new event before disabling the old event, to
keep the xenstore watch active.  If it is unregistered and
re-registered, it'll fire immediately for our suspended domain which
will end up continuously re-triggering.

Signed-off-by: Jason Andryuk 


Reviewed-by: Anthony PERARD 


While committing I was wondering: Does this want/need backporting (and hence
was it perhaps lacking a Fixes: tag)?


Thanks, Jan.

I don't think it's really worth backporting.  Mainly, it hasn't been an 
issue in the last 14 years.  A Linux domU doesn't suspend itself - it 
only does so in response to a xenstore watch.  A domU *could* suspend 
itself without the xenstore watch, but that doesn't seem to happen in 
practice.  Since xl has not been able to generate those xenstore events 
prior to the `xl suspend` introduction, this code path hasn't run or 
been an issue.


The tag would be:
Fixes: 1a0e17891f ("xl: support on_{poweroff,reboot,crash} domain 
configuration options.")


Regards,
Jason



New Xen Project Website

2024-12-02 Thread Kelly Choi
Hello Xen Community,

For the past 12 months, I have been working with our web developer and UX
designer on creating and refreshing our project website. I'm excited to
share that this is now live!
The URL remains the same and you can view this at *https://xenproject.org/
 *

These new changes are part of an ongoing effort to increase our visibility
and make it easier to find information on pages.

Our documentation will still link to our old wiki for now, as this is where
most of our guides are stored. However, you might remember we are in the
process of moving to Sphinx so we will eventually update the website once
we have more content created. We welcome community members to contribute to
this!

I'd also like to thank our existing board members, whose financial
investment in the Xen Project allows for these changes.

If you do have any feedback or notice errors, please feel free to reach out.

Thanks,
Kelly Choi
Community Manager
Xen Project 


[PATCH] libs/guest: Fix migration compatibility with a security-patched Xen 4.13

2024-12-02 Thread Andrew Cooper
xc_cpuid_apply_policy() provides compatibility for migration of a pre-4.14 VM
where no CPUID data was provided in the stream.

It guesses the various max-leaf limits, based on what was true at the time of
writing, but this was not correctly adapted when speculative security issues
forced the advertisement of new feature bits.  Of note are:

 * LFENCE-DISPATCH, in leaf 0x8021.eax
 * BHI-CTRL, in leaf 0x7[2].edx

In both cases, a VM booted on a security-patched Xen 4.13, and then migrated
on to any newer version of Xen on the same or compatible hardware would have
these features stripped back because Xen is still editing the cpu-policy for
sanity behind the back of the toolstack.

For VMs using BHI_DIS_S to mitigate Native-BHI, this resulted in a failure to
restore the guests MSR_SPEC_CTRL setting:

  (XEN) HVM d7v0 load MSR 0x48 with value 0x401 failed
  (XEN) HVM7 restore: failed to load entry 20/0 rc -6

Fixes: e9b4fe263649 ("x86/cpuid: support LFENCE always serialising CPUID bit")
Fixes: f3709b15fc86 ("x86/cpuid: Infrastructure for cpuid word 7:2.edx")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

Every option here is terrible.  This is the least terrible one I could come up
with.

XenServer, still the only caller to provide featureset into
xc_cpuid_apply_policy() could in principle reverse engineer these two numbers
from the non-zero-ness of the passed-in feature words.  However, such an
approach more complicated, equally as fragile, and not reusable with the
current plans to re-integrate the shrinking logic.  i.e. this is the least
invasive option.
---
 tools/libs/guest/xg_cpuid_x86.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100ad..263a9d4787b6 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -640,7 +640,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
domid, bool restore,
  *
  * This restore path is used for incoming VMs with no CPUID data
  * i.e. originated on Xen 4.13 or earlier.  We must invent a policy
- * compatible with what Xen 4.13 would have done on the same hardware.
+ * compatible with what a security-patched Xen 4.13 would have done on
+ * the same hardware.
  *
  * Specifically:
  * - Clamp max leaves.
@@ -657,8 +658,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
domid, bool restore,
 }
 
 p->policy.basic.max_leaf = min(p->policy.basic.max_leaf, 0xdu);
-p->policy.feat.max_subleaf = 0;
-p->policy.extd.max_leaf = min(p->policy.extd.max_leaf, 0x801c);
+p->policy.feat.max_subleaf = min(p->policy.feat.max_subleaf, 0x2u);
+p->policy.extd.max_leaf = min(p->policy.extd.max_leaf, 0x8021);
 }
 
 if ( featureset )

base-commit: 126b0a6e537ce1d486a29e35cfeec1f222a74d11
-- 
2.39.5




Re: [PATCH v2 1/4] common: remove -fno-stack-protector from EMBEDDED_EXTRA_CFLAGS

2024-12-02 Thread Jan Beulich
On 30.11.2024 02:10, Volodymyr Babchuk wrote:
> This patch is preparation for making stack protector
> configurable. First step is to remove -fno-stack-protector flag from
> EMBEDDED_EXTRA_CFLAGS so separate projects (Hypervisor in this case)
> can enable/disable this feature by themselves.

s/projects/components/ ?

> --- a/stubdom/Makefile
> +++ b/stubdom/Makefile
> @@ -54,6 +54,8 @@ TARGET_CFLAGS += $(CFLAGS)
>  TARGET_CPPFLAGS += $(CPPFLAGS)
>  $(call cc-options-add,TARGET_CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>  
> +$(call cc-option-add,TARGET_CFLAGS,CC,-fno-stack-protector)
> +
>  # Do not use host headers and libs
>  GCC_INSTALL = $(shell LANG=C gcc -print-search-dirs | sed -n -e 's/install: 
> \(.*\)/\1/p')
>  TARGET_CPPFLAGS += -U __linux__ -U __FreeBSD__ -U __sun__
> --- a/tools/firmware/Rules.mk
> +++ b/tools/firmware/Rules.mk
> @@ -15,6 +15,8 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>  
>  $(call cc-option-add,CFLAGS,CC,-fcf-protection=none)
>  
> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)
> +
>  # Do not add the .note.gnu.property section to any of the firmware objects: 
> it
>  # breaks the rombios binary and is not useful for firmware anyway.
>  $(call cc-option-add,CFLAGS,CC,-Wa$$(comma)-mx86-used-note=no)
> --- a/tools/tests/x86_emulator/testcase.mk
> +++ b/tools/tests/x86_emulator/testcase.mk
> @@ -4,6 +4,8 @@ include $(XEN_ROOT)/tools/Rules.mk
>  
>  $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
>  
> +$(call cc-option-add,CFLAGS,CC,-fno-stack-protector)

Is use of cc-option-add really necessary throughout here, when ...

> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -432,6 +432,8 @@ else
>  CFLAGS_UBSAN :=
>  endif
>  
> +CFLAGS += -fno-stack-protector

... is isn't needed here? Iirc the compiler version ranges supported don't
vary between components. Then again afaics $(EMBEDDED_EXTRA_CFLAGS) is used
by x86 only right now, and with cc-options-add, so perhaps it (a) needs
using cc-options-add here, too, and (b) it wants explaining why this needs
generalizing from x86 to all architectures. Quite possibly hypervisor use
of $(EMBEDDED_EXTRA_CFLAGS) may want generalizing separately, up front?

Jan



Re: [PATCH v2 4/4] xen: riscv: enable stack protector feature

2024-12-02 Thread Jan Beulich
On 30.11.2024 02:10, Volodymyr Babchuk wrote:
> Enable previously added CONFIG_STACK_PROTECTOR feature for RISC-V
> platform. Here we can call boot_stack_chk_guard_setup() in start_xen()
> function, because it never returns, so stack protector code will not
> be triggered because of changed canary.
> 
> Signed-off-by: Volodymyr Babchuk 
> Tested-by: Oleksii Kurochko 

Isn't this premature? For ...

> @@ -57,6 +58,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
>  if ( !boot_fdt_info(device_tree_flattened, dtb_addr) )
>  BUG();
>  
> +boot_stack_chk_guard_setup();

... this function's use of get_random(), either arch_get_random() needs
to be implemented, or (as Julien also pointed out for Arm64) NOW() needs
to work. Yet get_s_time() presently expands to just BUG_ON(). Given this
it's not even clear to me how Oleksii managed to actually test this.

Jan



Re: [PATCH v2 2/4] xen: common: add ability to enable stack protector

2024-12-02 Thread Jan Beulich
On 30.11.2024 02:10, Volodymyr Babchuk wrote:
> --- /dev/null
> +++ b/xen/common/stack-protector.c
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include 
> +#include 
> +
> +unsigned long __ro_after_init __stack_chk_guard;
> +
> +void __stack_chk_fail(void)
> +{
> +panic("Detected stack corruption\n");
> +}

That's very little information that'll end up in the log to understand
what has gone wrong.

> --- /dev/null
> +++ b/xen/include/xen/stack-protector.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef XEN__STACK_PROTECTOR_H
> +#define XEN__STACK_PROTECTOR_H
> +
> +#ifdef CONFIG_STACKPROTECTOR
> +
> +extern unsigned long __stack_chk_guard;
> +
> +/*
> + * This function should be always inlined. Also it should be called
> + * from a function that never returns or a function that with
> + * stack-protector disabled.
> + */
> +static always_inline void boot_stack_chk_guard_setup(void)
> +{
> + __stack_chk_guard = get_random();
> + if (BITS_PER_LONG == 64)
> + __stack_chk_guard |= ((unsigned long)get_random()) << 32;
> +}

The hard tabs here make it look like Linux style, when - unless there's a
specific reason - new files want to be Xen style.

Jan



[RFC PATCH 0/2] KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency

2024-12-02 Thread Kevin Loughlin
AMD CPUs currently execute WBINVD in the host when unregistering SEV
guest memory or when deactivating SEV guests. Such cache maintenance is
performed to prevent data corruption, wherein the encrypted (C=1)
version of a dirty cache line might otherwise only be written back
after the memory is written in a different context (ex: C=0), yielding
corruption. However, WBINVD is performance-costly, especially because
it invalidates processor caches.

Strictly-speaking, unless the SEV ASID is being recycled (meaning all
existing cache lines with the recycled ASID must be flushed), the
cache invalidation triggered by WBINVD is unnecessary; only the
writeback is needed to prevent data corruption in remaining scenarios.

To improve performance in these scenarios, use WBNOINVD when available
instead of WBINVD. WBNOINVD still writes back all dirty lines
(preventing host data corruption by SEV guests) but does *not*
invalidate processor caches.

First, provide helper functions to use WBINVD similar to how WBINVD is
invoked. Second, check for WBNOINVD support and execute WBNOINVD if
possible in lieu of WBINVD to avoid cache invalidations

Kevin Loughlin (2):
  x86, lib, xenpv: Add WBNOINVD helper functions
  KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency

 arch/x86/include/asm/paravirt.h   |  7 ++
 arch/x86/include/asm/paravirt_types.h |  1 +
 arch/x86/include/asm/smp.h|  7 ++
 arch/x86/include/asm/special_insns.h  | 12 -
 arch/x86/kernel/paravirt.c|  6 +
 arch/x86/kvm/svm/sev.c| 35 +--
 arch/x86/lib/cache-smp.c  | 12 +
 arch/x86/xen/enlighten_pv.c   |  1 +
 8 files changed, 67 insertions(+), 14 deletions(-)

-- 
2.47.0.338.g60cca15819-goog




RE: [PATCH, RESEND] x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()

2024-12-02 Thread Michael Kelley
From: Kirill A. Shutemov  Sent: Sunday, 
December 1, 2024 11:32 PM
> 
> Rename the helper to better reflect its function.
> 
> Signed-off-by: Kirill A. Shutemov 
> Suggested-by: Dave Hansen 
> Acked-by: Dave Hansen 

FWIW, I previously gave my "Reviewed-by" on this patch [1].
I didn't call it out explicitly, but did so for the patch in general
as well as specifically for the Hyper-V related change.

Michael

[1]: 
https://lore.kernel.org/linux-hyperv/sn6pr02mb4157c91ee70f4ef4b6edde46d4...@sn6pr02mb4157.namprd02.prod.outlook.com/

> ---
>  arch/x86/hyperv/ivm.c  |  2 +-
>  arch/x86/include/asm/mtrr.h| 10 +-
>  arch/x86/kernel/cpu/mtrr/generic.c |  6 +++---
>  arch/x86/kernel/cpu/mtrr/mtrr.c|  2 +-
>  arch/x86/kernel/kvm.c  |  2 +-
>  arch/x86/xen/enlighten_pv.c|  4 ++--
>  6 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 60fc3ed72830..90aabe1fd3b6 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -664,7 +664,7 @@ void __init hv_vtom_init(void)
>   x86_platform.guest.enc_status_change_finish = 
> hv_vtom_set_host_visibility;
> 
>   /* Set WB as the default cache mode. */
> - mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
> + guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK);
>  }
> 
>  #endif /* defined(CONFIG_AMD_MEM_ENCRYPT) ||
> defined(CONFIG_INTEL_TDX_GUEST) */
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index 4218248083d9..c69e269937c5 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -58,8 +58,8 @@ struct mtrr_state_type {
>   */
>  # ifdef CONFIG_MTRR
>  void mtrr_bp_init(void);
> -void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> -   mtrr_type def_type);
> +void guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_var,
> + mtrr_type def_type);
>  extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
>  extern void mtrr_save_fixed_ranges(void *);
>  extern void mtrr_save_state(void);
> @@ -75,9 +75,9 @@ void mtrr_disable(void);
>  void mtrr_enable(void);
>  void mtrr_generic_set_state(void);
>  #  else
> -static inline void mtrr_overwrite_state(struct mtrr_var_range *var,
> - unsigned int num_var,
> - mtrr_type def_type)
> +static inline void guest_force_mtrr_state(struct mtrr_var_range *var,
> +   unsigned int num_var,
> +   mtrr_type def_type)
>  {
>  }
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 7b29ebda024f..2fdfda2b60e4 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -423,7 +423,7 @@ void __init mtrr_copy_map(void)
>  }
> 
>  /**
> - * mtrr_overwrite_state - set static MTRR state
> + * guest_force_mtrr_state - set static MTRR state for a guest
>   *
>   * Used to set MTRR state via different means (e.g. with data obtained from
>   * a hypervisor).
> @@ -436,8 +436,8 @@ void __init mtrr_copy_map(void)
>   * @num_var: length of the @var array
>   * @def_type: default caching type
>   */
> -void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var,
> -   mtrr_type def_type)
> +void guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_var,
> + mtrr_type def_type)
>  {
>   unsigned int i;
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 989d368be04f..ecbda0341a8a 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -625,7 +625,7 @@ void mtrr_save_state(void)
>  static int __init mtrr_init_finalize(void)
>  {
>   /*
> -  * Map might exist if mtrr_overwrite_state() has been called or if
> +  * Map might exist if guest_force_mtrr_state() has been called or if
>* mtrr_enabled() returns true.
>*/
>   mtrr_copy_map();
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 21e9e4845354..7a422a6c5983 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -983,7 +983,7 @@ static void __init kvm_init_platform(void)
>   x86_platform.apic_post_init = kvm_apic_init;
> 
>   /* Set WB as the default cache mode for SEV-SNP and TDX */
> - mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
> + guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK);
>  }
> 
>  #if defined(CONFIG_AMD_MEM_ENCRYPT)
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index d6818c6cafda..633469fab536 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -171,7 +171,7 @@ static void __init xen_set_mtrr_data(void)
> 
>   /* Only overwrite MTRR state if any MTRR could be got from Xen. */
> 

Re: [RFC PATCH 1/2] x86, lib, xenpv: Add WBNOINVD helper functions

2024-12-02 Thread Juergen Gross

On 03.12.24 05:09, Xin Li wrote:

On 12/2/2024 5:44 PM, Kevin Loughlin wrote:

On Mon, Dec 2, 2024 at 5:28 PM Andrew Cooper  wrote:


On 03/12/2024 12:59 am, Kevin Loughlin wrote:

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d4eb9e1d61b8..c040af2d8eff 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -187,6 +187,13 @@ static __always_inline void wbinvd(void)
   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
  }

+extern noinstr void pv_native_wbnoinvd(void);
+
+static __always_inline void wbnoinvd(void)
+{
+ PVOP_ALT_VCALL0(cpu.wbnoinvd, "wbnoinvd", ALT_NOT_XEN);
+}


Given this, ...


diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index fec381533555..a66b708d8a1e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -149,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
   .cpu.write_cr0  = native_write_cr0,
   .cpu.write_cr4  = native_write_cr4,
   .cpu.wbinvd = pv_native_wbinvd,
+ .cpu.wbnoinvd   = pv_native_wbnoinvd,
   .cpu.read_msr   = native_read_msr,
   .cpu.write_msr  = native_write_msr,
   .cpu.read_msr_safe  = native_read_msr_safe,


this, and ...


diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d6818c6cafda..a5c76a6f8976 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1162,6 +1162,7 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
   .write_cr4 = xen_write_cr4,

   .wbinvd = pv_native_wbinvd,
+ .wbnoinvd = pv_native_wbnoinvd,

   .read_msr = xen_read_msr,
   .write_msr = xen_write_msr,


this, what is the point having a paravirt hook which is wired to
native_wbnoinvd() in all cases?

That just seems like overhead for overhead sake.


I'm mirroring what's done for WBINVD here, which was changed to a
paravirt hook in 10a099405fdf ("cpuidle, xenpv: Make more PARAVIRT_XXL
noinstr clean") in order to avoid calls out to instrumented code as
described in the commit message in more detail. I believe a hook is
similarly required for WBNOINVD, but please let me know if you
disagree. Thanks!


Then the question is why we need to add WBINVD/WBNOINVD to the paravirt
hooks.



We don't.

The wbinvd hook is a leftover from lguest times.

I'll send a patch to remove it.


Juergen

P.S.: As the paravirt maintainer I would have preferred to be Cc-ed in the
  initial patch mail.



Re: [PATCH v2 09/21] drm/etnaviv: Convert timeouts to secs_to_jiffies()

2024-12-02 Thread Christian Gmeiner
>
> Changes made with the following Coccinelle rules:
>
> @@ constant C; @@
>
> - msecs_to_jiffies(C * 1000)
> + secs_to_jiffies(C)
>
> @@ constant C; @@
>
> - msecs_to_jiffies(C * MSEC_PER_SEC)
> + secs_to_jiffies(C)
>
> Signed-off-by: Easwar Hariharan 

Reviewed-by: Christian Gmeiner 

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c
> index 
> 721d633aece9d4c81f0019e4c55884f26ee61c60..0f5a2c885d0ab7029c7248e15d6ea3c31823b782
>  100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c
> @@ -100,7 +100,7 @@ int etnaviv_cmdbuf_init(struct etnaviv_cmdbuf_suballoc 
> *suballoc,
> mutex_unlock(&suballoc->lock);
> ret = wait_event_interruptible_timeout(suballoc->free_event,
>suballoc->free_space,
> -  msecs_to_jiffies(10 * 
> 1000));
> +  secs_to_jiffies(10));
> if (!ret) {
> dev_err(suballoc->dev,
> "Timeout waiting for cmdbuf space\n");
>
> --
> 2.34.1
>


-- 
greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info/privacypolicy



[RFC PATCH 2/2] KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency

2024-12-02 Thread Kevin Loughlin
AMD CPUs currently execute WBINVD in the host when unregistering SEV
guest memory or when deactivating SEV guests. Such cache maintenance is
performed to prevent data corruption, wherein the encrypted (C=1)
version of a dirty cache line might otherwise only be written back
after the memory is written in a different context (ex: C=0), yielding
corruption. However, WBINVD is performance-costly, especially because
it invalidates processor caches.

Strictly-speaking, unless the SEV ASID is being recycled (meaning all
existing cache lines with the recycled ASID must be flushed), the
cache invalidation triggered by WBINVD is unnecessary; only the
writeback is needed to prevent data corruption in remaining scenarios.

To improve performance in these scenarios, use WBNOINVD when available
instead of WBINVD. WBNOINVD still writes back all dirty lines
(preventing host data corruption by SEV guests) but does *not*
invalidate processor caches.

Signed-off-by: Kevin Loughlin 
---
 arch/x86/kvm/svm/sev.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 943bd074a5d3..dbe40f728c4b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -116,6 +116,7 @@ static int sev_flush_asids(unsigned int min_asid, unsigned 
int max_asid)
 */
down_write(&sev_deactivate_lock);
 
+   /* SNP firmware expects WBINVD before SNP_DF_FLUSH, so do *not* use 
WBNOINVD */
wbinvd_on_all_cpus();
 
if (sev_snp_enabled)
@@ -710,6 +711,14 @@ static void sev_clflush_pages(struct page *pages[], 
unsigned long npages)
}
 }
 
+static void sev_wb_on_all_cpus(void)
+{
+   if (boot_cpu_has(X86_FEATURE_WBNOINVD))
+   wbnoinvd_on_all_cpus();
+   else
+   wbinvd_on_all_cpus();
+}
+
 static unsigned long get_num_contig_pages(unsigned long idx,
struct page **inpages, unsigned long npages)
 {
@@ -2774,11 +2783,11 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
}
 
/*
-* Ensure that all guest tagged cache entries are flushed before
-* releasing the pages back to the system for use. CLFLUSH will
-* not do this, so issue a WBINVD.
+* Ensure that all dirty guest tagged cache entries are written back
+* before releasing the pages back to the system for use. CLFLUSH will
+* not do this without SME_COHERENT, so issue a WB[NO]INVD.
 */
-   wbinvd_on_all_cpus();
+   sev_wb_on_all_cpus();
 
__unregister_enc_region_locked(kvm, region);
 
@@ -2900,11 +2909,11 @@ void sev_vm_destroy(struct kvm *kvm)
}
 
/*
-* Ensure that all guest tagged cache entries are flushed before
-* releasing the pages back to the system for use. CLFLUSH will
-* not do this, so issue a WBINVD.
+* Ensure that all dirty guest tagged cache entries are written back
+* before releasing the pages back to the system for use. CLFLUSH will
+* not do this without SME_COHERENT, so issue a WB[NO]INVD.
 */
-   wbinvd_on_all_cpus();
+   sev_wb_on_all_cpus();
 
/*
 * if userspace was terminated before unregistering the memory regions
@@ -3130,12 +3139,12 @@ static void sev_flush_encrypted_page(struct kvm_vcpu 
*vcpu, void *va)
 * by leaving stale encrypted data in the cache.
 */
if (WARN_ON_ONCE(wrmsrl_safe(MSR_AMD64_VM_PAGE_FLUSH, addr | asid)))
-   goto do_wbinvd;
+   goto do_wb_on_all_cpus;
 
return;
 
-do_wbinvd:
-   wbinvd_on_all_cpus();
+do_wb_on_all_cpus:
+   sev_wb_on_all_cpus();
 }
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
@@ -3149,7 +3158,7 @@ void sev_guest_memory_reclaimed(struct kvm *kvm)
if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;
 
-   wbinvd_on_all_cpus();
+   sev_wb_on_all_cpus();
 }
 
 void sev_free_vcpu(struct kvm_vcpu *vcpu)
@@ -3858,7 +3867,7 @@ static int __sev_snp_update_protected_guest_state(struct 
kvm_vcpu *vcpu)
 * guest-mapped page rather than the initial one allocated
 * by KVM in svm->sev_es.vmsa. In theory, svm->sev_es.vmsa
 * could be free'd and cleaned up here, but that involves
-* cleanups like wbinvd_on_all_cpus() which would ideally
+* cleanups like sev_wb_on_all_cpus() which would ideally
 * be handled during teardown rather than guest boot.
 * Deferring that also allows the existing logic for SEV-ES
 * VMSAs to be re-used with minimal SNP-specific changes.
-- 
2.47.0.338.g60cca15819-goog




[RFC PATCH 1/2] x86, lib, xenpv: Add WBNOINVD helper functions

2024-12-02 Thread Kevin Loughlin
In line with WBINVD usage, add WBONINVD helper functions, accounting
for kernels built with and without CONFIG_PARAVIRT_XXL.

Signed-off-by: Kevin Loughlin 
---
 arch/x86/include/asm/paravirt.h   |  7 +++
 arch/x86/include/asm/paravirt_types.h |  1 +
 arch/x86/include/asm/smp.h|  7 +++
 arch/x86/include/asm/special_insns.h  | 12 +++-
 arch/x86/kernel/paravirt.c|  6 ++
 arch/x86/lib/cache-smp.c  | 12 
 arch/x86/xen/enlighten_pv.c   |  1 +
 7 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d4eb9e1d61b8..c040af2d8eff 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -187,6 +187,13 @@ static __always_inline void wbinvd(void)
PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
 }
 
+extern noinstr void pv_native_wbnoinvd(void);
+
+static __always_inline void wbnoinvd(void)
+{
+   PVOP_ALT_VCALL0(cpu.wbnoinvd, "wbnoinvd", ALT_NOT_XEN);
+}
+
 static inline u64 paravirt_read_msr(unsigned msr)
 {
return PVOP_CALL1(u64, cpu.read_msr, msr);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 8d4fbe1be489..9a3f38ad1958 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -87,6 +87,7 @@ struct pv_cpu_ops {
 #endif
 
void (*wbinvd)(void);
+   void (*wbnoinvd)(void);
 
/* cpuid emulation, mostly so that caps bits can be disabled */
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ca073f40698f..ecf93a243b83 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -112,6 +112,7 @@ void native_play_dead(void);
 void play_dead_common(void);
 void wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
+int wbnoinvd_on_all_cpus(void);
 
 void smp_kick_mwait_play_dead(void);
 
@@ -160,6 +161,12 @@ static inline int wbinvd_on_all_cpus(void)
return 0;
 }
 
+static inline int wbnoinvd_on_all_cpus(void)
+{
+   wbnoinvd();
+   return 0;
+}
+
 static inline struct cpumask *cpu_llc_shared_mask(int cpu)
 {
return (struct cpumask *)cpumask_of(0);
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index aec6e2d3aa1d..c2d16ddcd79b 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -117,7 +117,12 @@ static inline void wrpkru(u32 pkru)
 
 static __always_inline void native_wbinvd(void)
 {
-   asm volatile("wbinvd": : :"memory");
+   asm volatile("wbinvd" : : : "memory");
+}
+
+static __always_inline void native_wbnoinvd(void)
+{
+   asm volatile("wbnoinvd" : : : "memory");
 }
 
 static inline unsigned long __read_cr4(void)
@@ -173,6 +178,11 @@ static __always_inline void wbinvd(void)
native_wbinvd();
 }
 
+static __always_inline void wbnoinvd(void)
+{
+   native_wbnoinvd();
+}
+
 #endif /* CONFIG_PARAVIRT_XXL */
 
 static __always_inline void clflush(volatile void *__p)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index fec381533555..a66b708d8a1e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -121,6 +121,11 @@ noinstr void pv_native_wbinvd(void)
native_wbinvd();
 }
 
+noinstr void pv_native_wbnoinvd(void)
+{
+   native_wbnoinvd();
+}
+
 static noinstr void pv_native_safe_halt(void)
 {
native_safe_halt();
@@ -149,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
.cpu.write_cr0  = native_write_cr0,
.cpu.write_cr4  = native_write_cr4,
.cpu.wbinvd = pv_native_wbinvd,
+   .cpu.wbnoinvd   = pv_native_wbnoinvd,
.cpu.read_msr   = native_read_msr,
.cpu.write_msr  = native_write_msr,
.cpu.read_msr_safe  = native_read_msr_safe,
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 7af743bd3b13..7ac5cca53031 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -20,3 +20,15 @@ int wbinvd_on_all_cpus(void)
return 0;
 }
 EXPORT_SYMBOL(wbinvd_on_all_cpus);
+
+static void __wbnoinvd(void *dummy)
+{
+   wbnoinvd();
+}
+
+int wbnoinvd_on_all_cpus(void)
+{
+   on_each_cpu(__wbnoinvd, NULL, 1);
+   return 0;
+}
+EXPORT_SYMBOL(wbnoinvd_on_all_cpus);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d6818c6cafda..a5c76a6f8976 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1162,6 +1162,7 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
.write_cr4 = xen_write_cr4,
 
.wbinvd = pv_native_wbinvd,
+   .wbnoinvd = pv_native_wbnoinvd,
 
.read_msr = xen_read_msr,
.write_msr = xen_write_msr,
-- 
2.47.0.338.g60cca15819-goog




Re: [RFC PATCH 1/2] x86, lib, xenpv: Add WBNOINVD helper functions

2024-12-02 Thread Kevin Loughlin
On Mon, Dec 2, 2024 at 5:28 PM Andrew Cooper  wrote:
>
> On 03/12/2024 12:59 am, Kevin Loughlin wrote:
> > diff --git a/arch/x86/include/asm/paravirt.h 
> > b/arch/x86/include/asm/paravirt.h
> > index d4eb9e1d61b8..c040af2d8eff 100644
> > --- a/arch/x86/include/asm/paravirt.h
> > +++ b/arch/x86/include/asm/paravirt.h
> > @@ -187,6 +187,13 @@ static __always_inline void wbinvd(void)
> >   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
> >  }
> >
> > +extern noinstr void pv_native_wbnoinvd(void);
> > +
> > +static __always_inline void wbnoinvd(void)
> > +{
> > + PVOP_ALT_VCALL0(cpu.wbnoinvd, "wbnoinvd", ALT_NOT_XEN);
> > +}
>
> Given this, ...
>
> > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > index fec381533555..a66b708d8a1e 100644
> > --- a/arch/x86/kernel/paravirt.c
> > +++ b/arch/x86/kernel/paravirt.c
> > @@ -149,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
> >   .cpu.write_cr0  = native_write_cr0,
> >   .cpu.write_cr4  = native_write_cr4,
> >   .cpu.wbinvd = pv_native_wbinvd,
> > + .cpu.wbnoinvd   = pv_native_wbnoinvd,
> >   .cpu.read_msr   = native_read_msr,
> >   .cpu.write_msr  = native_write_msr,
> >   .cpu.read_msr_safe  = native_read_msr_safe,
>
> this, and ...
>
> > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> > index d6818c6cafda..a5c76a6f8976 100644
> > --- a/arch/x86/xen/enlighten_pv.c
> > +++ b/arch/x86/xen/enlighten_pv.c
> > @@ -1162,6 +1162,7 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = 
> > {
> >   .write_cr4 = xen_write_cr4,
> >
> >   .wbinvd = pv_native_wbinvd,
> > + .wbnoinvd = pv_native_wbnoinvd,
> >
> >   .read_msr = xen_read_msr,
> >   .write_msr = xen_write_msr,
>
> this, what is the point having a paravirt hook which is wired to
> native_wbnoinvd() in all cases?
>
> That just seems like overhead for overhead sake.

I'm mirroring what's done for WBINVD here, which was changed to a
paravirt hook in 10a099405fdf ("cpuidle, xenpv: Make more PARAVIRT_XXL
noinstr clean") in order to avoid calls out to instrumented code as
described in the commit message in more detail. I believe a hook is
similarly required for WBNOINVD, but please let me know if you
disagree. Thanks!



Re: [RFC PATCH 1/2] x86, lib, xenpv: Add WBNOINVD helper functions

2024-12-02 Thread Andrew Cooper
On 03/12/2024 12:59 am, Kevin Loughlin wrote:
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index d4eb9e1d61b8..c040af2d8eff 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -187,6 +187,13 @@ static __always_inline void wbinvd(void)
>   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
>  }
>  
> +extern noinstr void pv_native_wbnoinvd(void);
> +
> +static __always_inline void wbnoinvd(void)
> +{
> + PVOP_ALT_VCALL0(cpu.wbnoinvd, "wbnoinvd", ALT_NOT_XEN);
> +}

Given this, ...

> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index fec381533555..a66b708d8a1e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -149,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
>   .cpu.write_cr0  = native_write_cr0,
>   .cpu.write_cr4  = native_write_cr4,
>   .cpu.wbinvd = pv_native_wbinvd,
> + .cpu.wbnoinvd   = pv_native_wbnoinvd,
>   .cpu.read_msr   = native_read_msr,
>   .cpu.write_msr  = native_write_msr,
>   .cpu.read_msr_safe  = native_read_msr_safe,

this, and ...

> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index d6818c6cafda..a5c76a6f8976 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1162,6 +1162,7 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
>   .write_cr4 = xen_write_cr4,
>  
>   .wbinvd = pv_native_wbinvd,
> + .wbnoinvd = pv_native_wbnoinvd,
>  
>   .read_msr = xen_read_msr,
>   .write_msr = xen_write_msr,

this, what is the point having a paravirt hook which is wired to
native_wbnoinvd() in all cases?

That just seems like overhead for overhead sake.

~Andrew



[PATCH] x86/paravirt: remove the wbinvd hook

2024-12-02 Thread Juergen Gross
The wbinvd paravirt hook is a leftover of lguest times. Today it is
no longer needed, as all users use the native wbinvd implementation.

Remove the hook and rename native_wbinvd() to wbinvd().

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h   | 7 ---
 arch/x86/include/asm/paravirt_types.h | 2 --
 arch/x86/include/asm/special_insns.h  | 8 +---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kernel/paravirt.c| 6 --
 arch/x86/kernel/process.c | 4 ++--
 arch/x86/xen/enlighten_pv.c   | 2 --
 7 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d4eb9e1d61b8..041aff51eb50 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -180,13 +180,6 @@ static inline void halt(void)
PVOP_VCALL0(irq.halt);
 }
 
-extern noinstr void pv_native_wbinvd(void);
-
-static __always_inline void wbinvd(void)
-{
-   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
-}
-
 static inline u64 paravirt_read_msr(unsigned msr)
 {
return PVOP_CALL1(u64, cpu.read_msr, msr);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 8d4fbe1be489..fea56b04f436 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -86,8 +86,6 @@ struct pv_cpu_ops {
void (*update_io_bitmap)(void);
 #endif
 
-   void (*wbinvd)(void);
-
/* cpuid emulation, mostly so that caps bits can be disabled */
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
  unsigned int *ecx, unsigned int *edx);
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index aec6e2d3aa1d..fab7c8af27a4 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -115,7 +115,7 @@ static inline void wrpkru(u32 pkru)
 }
 #endif
 
-static __always_inline void native_wbinvd(void)
+static __always_inline void wbinvd(void)
 {
asm volatile("wbinvd": : :"memory");
 }
@@ -167,12 +167,6 @@ static inline void __write_cr4(unsigned long x)
 {
native_write_cr4(x);
 }
-
-static __always_inline void wbinvd(void)
-{
-   native_wbinvd();
-}
-
 #endif /* CONFIG_PARAVIRT_XXL */
 
 static __always_inline void clflush(volatile void *__p)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c 
b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 972e6b6b0481..b72f7e91387e 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -459,7 +459,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
 * increase likelihood that allocated cache portion will be filled
 * with associated memory.
 */
-   native_wbinvd();
+   wbinvd();
 
/*
 * Always called with interrupts enabled. By disabling interrupts
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index fec381533555..927e33e6843a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -116,11 +116,6 @@ static noinstr void pv_native_set_debugreg(int regno, 
unsigned long val)
native_set_debugreg(regno, val);
 }
 
-noinstr void pv_native_wbinvd(void)
-{
-   native_wbinvd();
-}
-
 static noinstr void pv_native_safe_halt(void)
 {
native_safe_halt();
@@ -148,7 +143,6 @@ struct paravirt_patch_template pv_ops = {
.cpu.read_cr0   = native_read_cr0,
.cpu.write_cr0  = native_write_cr0,
.cpu.write_cr4  = native_write_cr4,
-   .cpu.wbinvd = pv_native_wbinvd,
.cpu.read_msr   = native_read_msr,
.cpu.write_msr  = native_write_msr,
.cpu.read_msr_safe  = native_read_msr_safe,
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f63f8fd00a91..58ead05a1c29 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -825,7 +825,7 @@ void __noreturn stop_this_cpu(void *dummy)
 * X86_FEATURE_SME due to cmdline options.
 */
if (c->extended_cpuid_level >= 0x801f && (cpuid_eax(0x801f) & 
BIT(0)))
-   native_wbinvd();
+   wbinvd();
 
/*
 * This brings a cache line back and dirties it, but
@@ -846,7 +846,7 @@ void __noreturn stop_this_cpu(void *dummy)
/*
 * Use native_halt() so that memory contents don't change
 * (stack usage and variables) after possibly issuing the
-* native_wbinvd() above.
+* wbinvd() above.
 */
native_halt();
}
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d6818c6cafda..fd2169063480 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1161,8 +1161,6 @@ static const typeof(pv_ops) xen_cpu_ops __ini

Re: [RFC PATCH 1/2] x86, lib, xenpv: Add WBNOINVD helper functions

2024-12-02 Thread Xin Li

On 12/2/2024 5:44 PM, Kevin Loughlin wrote:

On Mon, Dec 2, 2024 at 5:28 PM Andrew Cooper  wrote:


On 03/12/2024 12:59 am, Kevin Loughlin wrote:

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d4eb9e1d61b8..c040af2d8eff 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -187,6 +187,13 @@ static __always_inline void wbinvd(void)
   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT_XEN);
  }

+extern noinstr void pv_native_wbnoinvd(void);
+
+static __always_inline void wbnoinvd(void)
+{
+ PVOP_ALT_VCALL0(cpu.wbnoinvd, "wbnoinvd", ALT_NOT_XEN);
+}


Given this, ...


diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index fec381533555..a66b708d8a1e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -149,6 +154,7 @@ struct paravirt_patch_template pv_ops = {
   .cpu.write_cr0  = native_write_cr0,
   .cpu.write_cr4  = native_write_cr4,
   .cpu.wbinvd = pv_native_wbinvd,
+ .cpu.wbnoinvd   = pv_native_wbnoinvd,
   .cpu.read_msr   = native_read_msr,
   .cpu.write_msr  = native_write_msr,
   .cpu.read_msr_safe  = native_read_msr_safe,


this, and ...


diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d6818c6cafda..a5c76a6f8976 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1162,6 +1162,7 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
   .write_cr4 = xen_write_cr4,

   .wbinvd = pv_native_wbinvd,
+ .wbnoinvd = pv_native_wbnoinvd,

   .read_msr = xen_read_msr,
   .write_msr = xen_write_msr,


this, what is the point having a paravirt hook which is wired to
native_wbnoinvd() in all cases?

That just seems like overhead for overhead sake.


I'm mirroring what's done for WBINVD here, which was changed to a
paravirt hook in 10a099405fdf ("cpuidle, xenpv: Make more PARAVIRT_XXL
noinstr clean") in order to avoid calls out to instrumented code as
described in the commit message in more detail. I believe a hook is
similarly required for WBNOINVD, but please let me know if you
disagree. Thanks!


Then the question is why we need to add WBINVD/WBNOINVD to the paravirt
hooks.