date:20250311

Re: [PATCH] automation/cirrus-ci: store xen/.config as an artifact

2025-03-11 Thread Roger Pau Monné

On Mon, Mar 10, 2025 at 06:30:15PM +, Andrew Cooper wrote:
> On 10/03/2025 6:16 pm, Roger Pau Monne wrote:
> > Always store xen/.config as an artifact, renamed to xen-config to match
> > the naming used in the Gitlab CI tests.
> >
> > Reported-by: Andrew Cooper 
> > Signed-off-by: Roger Pau Monné 
> 
> Looking at this, I suspect my failure was caused by trying to capture
> ".config" directly.
> 
> > ---
> >  .cirrus.yml | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/.cirrus.yml b/.cirrus.yml
> > index e2949d99d73a..1a39f5026f9a 100644
> > --- a/.cirrus.yml
> > +++ b/.cirrus.yml
> > @@ -15,6 +15,14 @@ freebsd_template: &FREEBSD_ENV
> >  APPEND_INCLUDES: /usr/local/include
> >  CIRRUS_CLONE_DEPTH: 1
> >  
> > +freebsd_artifacts: &FREEBSD_ARTIFACTS
> > +  always:
> > +rename_script:
> > +  - cp xen/.config xen-config
> > +config_artifacts:
> > +  path: xen-config
> > +  type: text/plain
> 
> Can't this be part of freebsd_template directly?
> 
> Or is there an ordering problem with the regular build_script ?

Exactly, that was my first attempt (placing it in freebsd_template),
but then the collection would be done before the build, as
freebsd_template sets the env variables ahead of the build, see:

https://cirrus-ci.com/task/5086615544004608

Thanks, Roger.

Re: [PATCH 08/23] xen/arm: dom0less seed xenstore grant table entry

2025-03-11 Thread Jason Andryuk


On 2025-03-10 04:21, Jan Beulich wrote:

On 10.03.2025 09:17, Juergen Gross wrote:

On 10.03.25 09:01, Jan Beulich wrote:

On 06.03.2025 23:03, Jason Andryuk wrote:

+shared_entry_v1(gt, idx).flags = flags;
+shared_entry_v1(gt, idx).domid = be_domid;
+shared_entry_v1(gt, idx).frame = frame;
+}


In common code there shouldn't be an assumption that gnttab v1 is in use.


But isn't the grant table in V1 format until the guest potentially switches
to V2?


Strictly speaking it's in v0 format initially. But yes, I see your point.
Provided this function is made clear that it may only ever be used on a
domain that wasn't started yet (perhaps proven by way of an assertion).


Yes, this is what I was relying on.

If d is still passed in, we could do
ASSERT(!d->creation_finished);

Hmmm, the function might could be marked __init, too, I think.

struct grant_table and shared_entry_v1 are defined in 
xen/common/grant_table.c, which is why the function lives there (and 
it'll be useful for hyperlaunch).  To avoid unreachable code, I guess 
I'll wrap it inside #ifdef CONFIG_DOM0LESS_BOOT.


Regards,
Jason

[PATCH v4] xen/console: make console buffer size configurable

2025-03-11 Thread dmkhn

From: Denis Mukhin 

Add new CONRING_SHIFT Kconfig parameter to specify the boot console buffer size
as a power of 2.

The supported range is [14..27] -> [16KiB..128MiB].

Set default to 15 (32 KiB).

Resolves: https://gitlab.com/xen-project/xen/-/issues/185
Signed-off-by: Denis Mukhin 
---
Changes v3->v4:
- s/Link:/Resolves:/ in commit message to auto-close the gitlab ticket
---
 docs/misc/xen-command-line.pandoc |  5 +++--
 xen/drivers/char/Kconfig  | 24 
 xen/drivers/char/console.c|  6 +++---
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 89db6e83be..a471a9f7ce 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -425,10 +425,11 @@ The following are examples of correct specifications:
 ### conring_size
 > `= `
 
-> Default: `conring_size=16k`
-
 Specify the size of the console ring buffer.
 
+The default console ring buffer size is selected at build time via
+CONFIG_CONRING_SHIFT setting.
+
 ### console
 > `= List of [ vga | com1[H,L] | com2[H,L] | pv | dbgp | ehci | xhci | none ]`
 
diff --git a/xen/drivers/char/Kconfig b/xen/drivers/char/Kconfig
index e6e12bb413..e238d4eccf 100644
--- a/xen/drivers/char/Kconfig
+++ b/xen/drivers/char/Kconfig
@@ -96,6 +96,30 @@ config SERIAL_TX_BUFSIZE
 
  Default value is 32768 (32KiB).
 
+config CONRING_SHIFT
+   int "Console ring buffer size (power of 2)"
+   range 14 27
+   default 15
+   help
+ Select the boot console ring buffer size as a power of 2.
+ Run-time console ring buffer size is the same as the boot console ring
+ buffer size, unless overridden via 'conring_size=' boot parameter.
+
+   27 => 128 MiB
+   26 =>  64 MiB
+   25 =>  32 MiB
+   24 =>  16 MiB
+   23 =>   8 MiB
+   22 =>   4 MiB
+   21 =>   2 MiB
+   20 =>   1 MiB
+   19 => 512 KiB
+   18 => 256 KiB
+   17 => 128 KiB
+   16 =>  64 KiB
+   15 =>  32 KiB (default)
+   14 =>  16 KiB
+
 config XHCI
bool "XHCI DbC UART driver"
depends on X86
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index ba428199d2..78794b74e9 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -101,12 +101,12 @@ static int cf_check parse_console_timestamps(const char 
*s);
 custom_runtime_param("console_timestamps", parse_console_timestamps,
  con_timestamp_mode_upd);
 
-/* conring_size: allows a large console ring than default (16kB). */
+/* conring_size: override build-time CONFIG_CONRING_SHIFT setting. */
 static uint32_t __initdata opt_conring_size;
 size_param("conring_size", opt_conring_size);
 
-#define _CONRING_SIZE 16384
-#define CONRING_IDX_MASK(i) ((i)&(conring_size-1))
+#define _CONRING_SIZE   (1U << CONFIG_CONRING_SHIFT)
+#define CONRING_IDX_MASK(i) ((i) & (conring_size - 1))
 static char __initdata _conring[_CONRING_SIZE];
 static char *__read_mostly conring = _conring;
 static uint32_t __read_mostly conring_size = _CONRING_SIZE;
-- 
2.34.1

Re: [PATCH v2 08/16] exec/memory-internal: remove dependency on cpu.h

2025-03-11 Thread Philippe Mathieu-Daudé


On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


Missing the "why" justification we couldn't do that before.


---
  include/exec/memory-internal.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 100c1237ac2..b729f3b25ad 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -20,8 +20,6 @@
  #ifndef MEMORY_INTERNAL_H
  #define MEMORY_INTERNAL_H
  
-#include "cpu.h"

-
  #ifndef CONFIG_USER_ONLY
  static inline AddressSpaceDispatch *flatview_to_dispatch(FlatView *fv)
  {

Re: [PATCH v2 07/16] exec/exec-all: remove dependency on cpu.h

2025-03-11 Thread Philippe Mathieu-Daudé


On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


Missing the "why" justification we couldn't do that before.


---
  include/exec/exec-all.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index dd5c40f2233..19b0eda44a7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -20,7 +20,6 @@
  #ifndef EXEC_ALL_H
  #define EXEC_ALL_H
  
-#include "cpu.h"

  #if defined(CONFIG_USER_ONLY)
  #include "exec/cpu_ldst.h"
  #endif

[PATCH] docs: add explanation for 'Resolved:'

2025-03-11 Thread dmkhn

From: Denis Mukhin 

'Resolves:' tag may be used if the patch addresses one of the tickets
logged via Gitlab to auto-close such ticket when the patch got merged.

Add documentation for the tag.

Signed-off-by: Denis Mukhin 
---
 docs/process/sending-patches.pandoc | 12 
 1 file changed, 12 insertions(+)

diff --git a/docs/process/sending-patches.pandoc 
b/docs/process/sending-patches.pandoc
index 2e74c3b57e..9fc3b407ff 100644
--- a/docs/process/sending-patches.pandoc
+++ b/docs/process/sending-patches.pandoc
@@ -106,6 +106,18 @@ If git was configured as explained earlier, this can be 
retrieved using
 ``git log --pretty=fixes`` otherwise ``git log --abbrev=12 --oneline`` will
 give the proper tag and commit-id.
 
+### Resolves:
+
+If your patch addresses an issue logged in a GitLab ticket, use the `Resolves:`
+tag followed by the issue link to automatically close the ticket when the patch
+is merged.
+
+Resolves: 
+
+E.g.:
+
+Resolves: https://gitlab.com/xen-project/xen/-/issues/185
+
 ### Backport:
 
 A backport tag is an optional tag in the commit message to request a
-- 
2.34.1

Re: [PATCH v2 11/16] exec/ram_addr: call xen_hvm_modified_memory only if xen is enabled

2025-03-11 Thread Philippe Mathieu-Daudé


On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


I didn't follow this direction because Richard had a preference
on removing unnecessary inlined functions:
https://lore.kernel.org/qemu-devel/9151205a-13d3-401e-b403-f9195cdb1...@linaro.org/


---
  include/exec/ram_addr.h | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 7c011fadd11..098fccb5835 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -342,7 +342,9 @@ static inline void 
cpu_physical_memory_set_dirty_range(ram_addr_t start,
  }
  }
  
-xen_hvm_modified_memory(start, length);

+if (xen_enabled()) {
+xen_hvm_modified_memory(start, length);
+}
  }

[PATCH v2] docs: specify numerical values of Xenstore commands

2025-03-11 Thread Juergen Gross

In docs/misc/xenstore.txt all Xenstore commands are specified, but
the specifications lack the numerical values of the commands.

Add a table with all commands, their values, and a potential remark
(e.g. whether the command is optional).

Reported-by: Jan Beulich 
Signed-off-by: Juergen Gross 
---
V2:
- replace "ŕ" with plain "r" (Jan Beulich)
- replace hard tabs with blanks (Jan Beulich)
- allow GET_FEATURES and GET_QUOTA support without SET_* (Jan Beulich)
---
 docs/misc/xenstore.txt | 57 ++
 1 file changed, 57 insertions(+)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 7e1f031520..8b4b790e11 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -86,6 +86,63 @@ parts of xenstore inaccessible to some clients.  In any case 
passing
 bulk data through xenstore is not recommended as the performance
 properties are poor.
 
+-- Defined Xenstore message types --
+
+Below is a table with all defined Xenstore message types (type name
+and its associated numerical value).
+
+Some types are optional to be supported by a specific Xenstore
+implementation.  If an optional type is not supported by a Xenstore
+implementation, Xen tools will continue to work, maybe with slightly
+reduced functionality.  A mandatory type not being supported will
+result in severely reduced functionality, like inability to create
+domains.  In case a type is optional, this is stated in the table with
+the lost functionality in case Xenstore doesn't support that type.
+
+XS_CONTROL   0optional
+If not supported, xenstore-control command will not work.
+XS_DEBUG is a deprecated alias of XS_CONTROL.
+XS_DIRECTORY 1
+XS_READ  2
+XS_GET_PERMS 3
+XS_WATCH 4
+XS_UNWATCH   5
+XS_TRANSACTION_START 6
+XS_TRANSACTION_END   7
+XS_INTRODUCE 8
+XS_RELEASE   9
+XS_GET_DOMAIN_PATH  10
+XS_WRITE11
+XS_MKDIR12
+XS_RM   13
+XS_SET_PERMS14
+XS_WATCH_EVENT  15
+Not valid in client sent messages.
+Only valid in Xenstore replies.
+XS_ERROR16
+Not valid in client sent messages.
+Only valid in Xenstore replies.
+XS_IS_DOMAIN_INTRODUCED 17
+XS_RESUME   18
+XS_SET_TARGET   19
+XS_RESTRICT 20no longer supported
+XS_RESTRICT has been removed, the type value 20 is invalid.
+XS_RESET_WATCHES21
+XS_DIRECTORY_PART   22optional
+If not supported, the output of xenstore-ls might be incomplete
+with more than ca. 1000 domains active.
+XS_GET_FEATURE  23optional
+XS_SET_FEATURE  24optional
+XS_SET_FEATURE requires XS_GET_FEATURE to be supported.
+If unsupported, setting availability of Xenstore features per
+domain is not possible.
+XS_GET_QUOTA25optional
+XS_SET_QUOTA26optional
+XS_SET_QUOTA requires XS_GET_QUOTA to be supported.
+If unsupported, setting of Xenstore quota per domain is not
+possible.
+XS_INVALID   65535
+Guaranteed invalid type (never supported).
 
 -- Xenstore protocol details - introduction --
 
-- 
2.43.0

Re: [PATCH] docs: add explanation for 'Resolved:'

2025-03-11 Thread Denis Mukhin

On Tuesday, March 11th, 2025 at 12:28 AM, dm...@proton.me  
wrote:

> 
> 
> From: Denis Mukhin dmuk...@ford.com
> 
> 
> 'Resolves:' tag may be used if the patch addresses one of the tickets
> logged via Gitlab to auto-close such ticket when the patch got merged.
> 
> Add documentation for the tag.
> 
> Signed-off-by: Denis Mukhin dmuk...@ford.com

Whoops, I made a typo in the subject line of the commit message:
  s/Resolved/Resolves/

If all looks good, can I ask to fix the typo on commit?

Thanks,
Denis

> 
> ---
> docs/process/sending-patches.pandoc | 12 
> 1 file changed, 12 insertions(+)
> 
> diff --git a/docs/process/sending-patches.pandoc 
> b/docs/process/sending-patches.pandoc
> index 2e74c3b57e..9fc3b407ff 100644
> --- a/docs/process/sending-patches.pandoc
> +++ b/docs/process/sending-patches.pandoc
> @@ -106,6 +106,18 @@ If git was configured as explained earlier, this can be 
> retrieved using
> `git log --pretty=fixes` otherwise `git log --abbrev=12 --oneline` will
> give the proper tag and commit-id.
> 
> +### Resolves:
> +
> +If your patch addresses an issue logged in a GitLab ticket, use the 
> `Resolves:`
> +tag followed by the issue link to automatically close the ticket when the 
> patch
> +is merged.
> +
> + Resolves: 
> 
> +
> +E.g.:
> +
> + Resolves: https://gitlab.com/xen-project/xen/-/issues/185
> +
> ### Backport:
> 
> A backport tag is an optional tag in the commit message to request a
> --
> 2.34.1

Re: [PATCH v3 2/2] x86/iommu: avoid MSI address and data writes if IRT index hasn't changed

2025-03-11 Thread Jan Beulich

On 10.03.2025 16:42, Roger Pau Monné wrote:
> On Mon, Mar 10, 2025 at 11:51:09AM +0100, Jan Beulich wrote:
>> On 10.03.2025 10:55, Roger Pau Monne wrote:
>>> Attempt to reduce the MSI entry writes, and the associated checking whether
>>> memory decoding and MSI-X is enabled for the PCI device, when the MSI data
>>> hasn't changed.
>>>
>>> When using Interrupt Remapping the MSI entry will contain an index into
>>> the remapping table, and it's in such remapping table where the MSI vector
>>> and destination CPU is stored.  As such, when using interrupt remapping,
>>> changes to the interrupt affinity shouldn't result in changes to the MSI
>>> entry, and the MSI entry update can be avoided.
>>>
>>> Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
>>> address fields have changed, and thus need writing to the device registers.
>>> Such signaling is done by returning 1 from the function.  Otherwise
>>> returning 0 means no update of the MSI fields, and thus no write
>>> required.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>
>> Reviewed-by: Jan Beulich 
>> with two purely cosmetic suggestions and an only loosely related question 
>> below.
>>
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -415,7 +415,9 @@ static int cf_check vmx_pi_update_irte(const struct 
>>> vcpu *v,
>>>  
>>>  ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
>>>  
>>> -return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
>>> +rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
>>> +
>>> +return rc < 0 ? rc : 0;
>>
>> Only tangential here, but: Why does this function have a return type of
>> non-void, when neither caller cares?
> 
> I'm afraid there's more wrong in vmx_pi_update_irte() that I've just
> spotted afterwards.
> 
> vmx_pi_update_irte() passes to iommu_update_ire_from_msi() the
> msi_desc->msg field, but that field is supposed to always contain the
> non-translated MSI data, as you correctly pointed out in v1 it's
> consumed by dump_msi().  vmx_pi_update_irte() using msi_desc->msg to
> store the translated MSI effectively breaks dump_msi().

Oh, indeed - it violates what write_msi_msg() specifically checks by
an assertion.

> Also vmx_pi_update_irte() relies on the IRT index never changing, as
> otherwise it's missing any logic to update the MSI registers.

Isn't this a valid assumption here? msi_msg_to_remap_entry() will only
ever allocate a new index if none was previously allocated.

>>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>>> @@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg(
>>> get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
>>>  }
>>>  
>>> -return 0;
>>> +return !fresh ? 0 : 1;
>>>  }
>>
>> Simply
>>
>> return fresh;
>>
>> ?
>>
>>> @@ -546,7 +546,7 @@ int cf_check amd_iommu_msi_msg_update_ire(
>>>  rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
>>>  &msi_desc->remap_index,
>>>  msg, &data);
>>> -if ( !rc )
>>> +if ( rc > 0 )
>>>  {
>>>  for ( i = 1; i < nr; ++i )
>>>  msi_desc[i].remap_index = msi_desc->remap_index + i;
>>> --- a/xen/drivers/passthrough/vtd/intremap.c
>>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>>> @@ -506,6 +506,7 @@ static int msi_msg_to_remap_entry(
>>>  unsigned int index, i, nr = 1;
>>>  unsigned long flags;
>>>  const struct pi_desc *pi_desc = msi_desc->pi_desc;
>>> +bool alloc = false;
>>>  
>>>  if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>>>  nr = msi_desc->msi.nvec;
>>> @@ -529,6 +530,7 @@ static int msi_msg_to_remap_entry(
>>>  index = alloc_remap_entry(iommu, nr);
>>>  for ( i = 0; i < nr; ++i )
>>>  msi_desc[i].remap_index = index + i;
>>> +alloc = true;
>>>  }
>>>  else
>>>  index = msi_desc->remap_index;
>>> @@ -601,7 +603,7 @@ static int msi_msg_to_remap_entry(
>>>  unmap_vtd_domain_page(iremap_entries);
>>>  spin_unlock_irqrestore(&iommu->intremap.lock, flags);
>>>  
>>> -return 0;
>>> +return alloc ? 1 : 0;
>>>  }
>>
>> Like above, simply
>>
>> return alloc;
>>
>> ?
> 
> I wasn't sure whether this was overloading the boolean type and
> possibly breaking some MISRA rule.  I can adjust.

What we are to do with Misra's essential type system is entirely unclear at
this point, aiui.

Jan

Re: [PATCH] xen/mcelog: Add __nonstring annotations for unterminated strings

2025-03-11 Thread Jürgen Groß


On 10.03.25 23:22, Kees Cook wrote:

When a character array without a terminating NUL character has a static
initializer, GCC 15's -Wunterminated-string-initialization will only
warn if the array lacks the "nonstring" attribute[1]. Mark the arrays
with __nonstring to and correctly identify the char array as "not a C
string" and thereby eliminate the warning.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1]
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Oleksandr Tyshchenko 
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Kees Cook 


Acked-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH 06/23] xen/domctl: Expose privileged and hardware capabilities

2025-03-11 Thread Jan Beulich

On 10.03.2025 15:11, Jason Andryuk wrote:
> On 2025-03-10 05:03, Jan Beulich wrote:
>> On 06.03.2025 23:03, Jason Andryuk wrote:
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -155,6 +155,12 @@ struct xen_domctl_getdomaininfo {
>>>   /* domain has hardware assisted paging */
>>>   #define _XEN_DOMINF_hap   8
>>>   #define XEN_DOMINF_hap(1U<<_XEN_DOMINF_hap)
>>> +/* domain is hardware domain */
>>> +#define _XEN_DOMINF_hardware  9
>>> +#define XEN_DOMINF_hardware   (1U<<_XEN_DOMINF_hardware)
>>> +/* domain is privileged */
>>> +#define _XEN_DOMINF_priv  10
>>> +#define XEN_DOMINF_priv   (1U<<_XEN_DOMINF_priv)
>>
>> Oh, and: If we really need both constants (I doubt we do), the latter wants
>> to follow style even if all of its siblings don't (i.e. blanks around binary
>> operators).
> 
> Ok on this and the rename.
> 
> Why do you think they are not necessary?  I did not see a way to expose 
> the capabilities for other domains.
> 
> Or do you mean if they are added to XEN_DOMCTL_get_domain_state that 
> won't be necessary?

Oh, I guess "both" was ambiguous: I meant both _XEN_DOMINF_* and XEN_DOMINF_*.
Of course we need both a hardware and control bit here.

Jan

Re: [PATCH] tools: Mark ACPI SDTs as NVS in the PVH build path

2025-03-11 Thread Alejandro Vallejo

On Tue Mar 11, 2025 at 8:30 AM GMT, Jan Beulich wrote:
> On 10.03.2025 16:25, Alejandro Vallejo wrote:
> > Commit cefeffc7e583 marked ACPI tables as NVS in the hvmloader path
> > because SeaBIOS may otherwise just mark it as RAM. There is, however,
> > yet another reason to do it even in the PVH path. Xen's incarnation of
> > AML relies on having access to some ACPI tables (e.g: _STA of Processor
> > objects relies on reading the processor online bit in its MADT entry)
> > 
> > This is problematic if the OS tries to reclaim ACPI memory for page
> > tables as it's needed for runtime and can't be reclaimed after the OSPM
> > is up and running.
> > 
> > Fixes: de6d188a519f("hvmloader: flip "ACPI data" to "ACPI NVS" type for 
> > ACPI table region)"
> > Signed-off-by: Alejandro Vallejo 
> > ---
> > I really, really, really dislike this idea of accessing the MADT from
> > AML.
>
> I think this isn't Xen's invention, but something I've seen in various
> systems' AML.
>

Do you mean ACPI hotplug? I don't think I've ever seen a real system with
support for it. I don't suppose you remember any specifically? I'd be quite
interested to have a look at their ACPI dumps.

> > In time I'll try to implement something to stop doing it, but for
> > the time being I find it preferable to align libxl to hvmloader rather
> > than trying to restrict what's reclaimable and what isn't.
> > ---
> >  tools/firmware/hvmloader/e820.c | 4 
> >  tools/libs/light/libxl_x86.c| 2 +-
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/firmware/hvmloader/e820.c 
> > b/tools/firmware/hvmloader/e820.c
> > index c490a0bc790c..86d39544e887 100644
> > --- a/tools/firmware/hvmloader/e820.c
> > +++ b/tools/firmware/hvmloader/e820.c
> > @@ -210,6 +210,10 @@ int build_e820_table(struct e820entry *e820,
> >   * space reuse by an ACPI unaware / buggy bootloader, option ROM, etc.
> >   * before an ACPI OS takes control. This is possible due to the fact 
> > that
> >   * ACPI NVS memory is explicitly described as non-reclaimable in ACPI 
> > spec.
> > + *
> > + * Furthermore, Xen relies on accessing ACPI tables from within the AML
> > + * code exposed to guests. So Xen's ACPI tables are not, in general,
> > + * reclaimable.
> >   */
> >  
> >  if ( acpi_enabled )
> > diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> > index a3164a3077fe..265da8072a59 100644
> > --- a/tools/libs/light/libxl_x86.c
> > +++ b/tools/libs/light/libxl_x86.c
> > @@ -742,7 +742,7 @@ static int domain_construct_memmap(libxl__gc *gc,
> >  e820[nr].addr = dom->acpi_modules[i].guest_addr_out & 
> > ~(page_size - 1);
> >  e820[nr].size = dom->acpi_modules[i].length +
> >  (dom->acpi_modules[i].guest_addr_out & (page_size - 1));
> > -e820[nr].type = E820_ACPI;
> > +e820[nr].type = E820_NVS;
>
> This likely needs a comment then, too.

Fair enough. Let me send a v2 with that.

>
> Jan

Cheers,
Alejandro

Re: [PATCH 1/2] xen/arm: Improve handling of nr_spis

2025-03-11 Thread Bertrand Marquis

Hi Michal,

> On 11 Mar 2025, at 10:04, Michal Orzel  wrote:
> 
> At the moment, we print a warning about max number of IRQs supported by
> GIC bigger than vGIC only for hardware domain. This check is not hwdom
> special, and should be made common. Also, in case of user not specifying
> nr_spis for dom0less domUs, we should take into account max number of
> IRQs supported by vGIC if it's smaller than for GIC.
> 
> Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
> Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
> IRQs comparison common.
> 
> Signed-off-by: Michal Orzel 
> ---
> xen/arch/arm/dom0less-build.c   | 2 +-
> xen/arch/arm/domain_build.c | 9 ++---
> xen/arch/arm/gic.c  | 3 +++
> xen/arch/arm/include/asm/vgic.h | 3 +++
> 4 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
> index 31f31c38da3f..9a84fee94119 100644
> --- a/xen/arch/arm/dom0less-build.c
> +++ b/xen/arch/arm/dom0less-build.c
> @@ -1018,7 +1018,7 @@ void __init create_domUs(void)
> {
> int vpl011_virq = GUEST_VPL011_SPI;
> 
> -d_cfg.arch.nr_spis = gic_number_lines() - 32;
> +d_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;

I would suggest to introduce a static inline gic_nr_spis in a gic header ...

> 
> /*
>  * The VPL011 virq is GUEST_VPL011_SPI, unless direct-map is
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 7cc141ef75e9..b99c4e3a69bf 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -2371,13 +2371,8 @@ void __init create_dom0(void)
> 
> /* The vGIC for DOM0 is exactly emulating the hardware GIC */
> dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
> -/*
> - * Xen vGIC supports a maximum of 992 interrupt lines.
> - * 32 are substracted to cover local IRQs.
> - */
> -dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32;
> -if ( gic_number_lines() > 992 )
> -printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded.\n");
> +/* 32 are substracted to cover local IRQs */
> +dom0_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;

and reuse it here to make sure the value used is always the same.

> dom0_cfg.arch.tee_type = tee_get_type();
> dom0_cfg.max_vcpus = dom0_max_vcpus();
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index acf61a4de373..e80fe0ca2421 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -251,6 +251,9 @@ void __init gic_init(void)
> panic("Failed to initialize the GIC drivers\n");
> /* Clear LR mask for cpu0 */
> clear_cpu_lr_mask();
> +
> +if ( gic_number_lines() > VGIC_MAX_IRQS )
> +printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded\n");

I am a bit unsure with this one.
If this is the case, it means any gicv2 or gicv3 init detected an impossible 
value and
any usage of gic_number_lines would be using an impossible value.

Shouldn't this somehow result in a panic as the gic detection was wrong ?
Do you think we can continue to work safely if this value is over VGIC_MAX_IRQS.
There are other places using gic_number_lines like in irq.c.

Cheers
Bertrand


> }
> 
> void send_SGI_mask(const cpumask_t *cpumask, enum gic_sgi sgi)
> diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h
> index e309dca1ad01..c549e5840bfa 100644
> --- a/xen/arch/arm/include/asm/vgic.h
> +++ b/xen/arch/arm/include/asm/vgic.h
> @@ -329,6 +329,9 @@ extern void vgic_check_inflight_irqs_pending(struct vcpu 
> *v,
>  */
> #define vgic_num_irqs(d)((d)->arch.vgic.nr_spis + 32)
> 
> +/* Maximum number of IRQs supported by vGIC */
> +#define VGIC_MAX_IRQS 992U
> +
> /*
>  * Allocate a guest VIRQ
>  *  - spi == 0 => allocate a PPI. It will be the same on every vCPU
> -- 
> 2.25.1
>

Re: [PATCH v2 1/5] xen/arm: Create tee command line parameter

2025-03-11 Thread Bertrand Marquis

Hi Jan,

> On 11 Mar 2025, at 09:35, Jan Beulich  wrote:
> 
> On 10.03.2025 15:50, Bertrand Marquis wrote:
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2651,6 +2651,20 @@ Specify the per-cpu trace buffer size in pages.
>> 
>> Flag to enable TSC deadline as the APIC timer mode.
>> 
>> +### tee
>> +> `= `
> 
> This wants an arch restriction, like we have for other command line options
> supported only by one arch.

Oh yes definitely.
I will fix that in the next version.

Thanks
Bertrand

> 
> Jan
> 
>> +Specify the TEE mediator to be probed and use.
>> +
>> +The default behaviour is to probe all supported TEEs supported by Xen and 
>> use
>> +the first one successfully probed. When this parameter is passed, Xen will
>> +probe only the TEE mediator passed as argument and boot will fail if this
>> +mediator is not properly probed or if the requested TEE is not supported by
>> +Xen.
>> +
>> +This parameter can be set to `optee` of `ffa` if the corresponding mediators
>> +are compiled in.
>> +
>> ### tevt_mask
>>> `= `
>> 
>

Re: [PATCH v8 7/9] docs: update xenstore migration stream definition

2025-03-11 Thread Julien Grall


Hi Juergen,

On 04/02/2025 11:34, Juergen Gross wrote:

In order to close a race window for Xenstore live update when using
the new unique_id of domains, the migration stream needs to contain
this unique_id for each domain known by Xenstore.

Signed-off-by: Juergen Gross 
---
V8:
- new patch
---
  docs/designs/xenstore-migration.md | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/docs/designs/xenstore-migration.md 
b/docs/designs/xenstore-migration.md
index 082314bf72..fba691ee0d 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -156,7 +156,7 @@ the domain being migrated.
  ```
  0   1   2   3   4   5   6   7octet
  +---+---+---+---+---+---+---+---+
-| conn-id   | conn-type |   |
+| conn-id   | conn-type | uniq-id-off   |
  +---+---+---+

>   | conn-spec>   ...

@@ -165,6 +165,9 @@ the domain being migrated.
  +---+---+---+
  | data
  ...
++---+
+| unique-id |
++---+
  ```
  
  
@@ -178,6 +181,12 @@ the domain being migrated.

  || 0x0001: socket   |
  || 0x0002 - 0x: reserved for future use |
  ||  |
+| `uniq-id-off`  | The offset (in octets) of the `unique-id`|
+|| field from the start of the record body. |
+|| If 0, no `unique-id` field is present.   |
+|| Only needed for `shared ring` connection in  |
+|| live update streams. |
+||  |


Looking at the rest of the record, the unique ID would be past the 
in-data (length is 2-byte) and the out-data (length is 4-byte). So 
technically the offset would need 5-bytes. But here you are using 
2-bytes. Can you explain why?


Cheers,

--
Julien Grall

[PATCH v2 0/7] Add support for R-Car Gen4 PCI host controller

2025-03-11 Thread Mykyta Poturai

This series adds support for R-Car Gen4 PCI host controller.

To fully support the controller, the following changes were made:
- Generic mechanism to support PCI child buses is added.
- Private data for PCI host bridge and means to access it are added.

The series also includes a workaround for proper ATU propramming and
optimizations to lessen the performance impact of that workaround.

The series was tested as a part of the pci-passthrough patches[1] and
build-tested standalone with enabled HAS_PCI and HAS_VPCI.

[1] https://github.com/Deedone/xen/tree/pci_passthrough_wip

v1->v2:
* see individual patches

Oleksandr Andrushchenko (4):
  xen/arm: allow PCI host bridge to have private data
  xen/arm: make pci_host_common_probe return the bridge
  xen/arm: add support for PCI child bus
  xen/arm: add support for R-Car Gen4 PCI host controller

Volodymyr Babchuk (3):
  xen/arm: rcar4: add delay after programming ATU
  xen/arm: rcar4: add simple optimization to avoid ATU reprogramming
  xen/arm: rcar4: program ATU to accesses to all functions

 xen/arch/arm/include/asm/pci.h  |  16 +-
 xen/arch/arm/pci/Makefile   |   2 +
 xen/arch/arm/pci/ecam.c |  17 +-
 xen/arch/arm/pci/pci-access.c   |  37 ++-
 xen/arch/arm/pci/pci-designware.c   | 422 
 xen/arch/arm/pci/pci-designware.h   | 105 +++
 xen/arch/arm/pci/pci-host-common.c  | 106 +--
 xen/arch/arm/pci/pci-host-generic.c |   2 +-
 xen/arch/arm/pci/pci-host-rcar4.c   | 104 +++
 xen/arch/arm/pci/pci-host-zynqmp.c  |   2 +-
 xen/arch/arm/vpci.c |  83 --
 11 files changed, 846 insertions(+), 50 deletions(-)
 create mode 100644 xen/arch/arm/pci/pci-designware.c
 create mode 100644 xen/arch/arm/pci/pci-designware.h
 create mode 100644 xen/arch/arm/pci/pci-host-rcar4.c

-- 
2.34.1

[PATCH v2 4/7] xen/arm: add support for R-Car Gen4 PCI host controller

2025-03-11 Thread Mykyta Poturai

From: Oleksandr Andrushchenko 

Add support for Renesas R-Car Gen4 PCI host controller, specifically
targeting the S4 and V4H SoCs. The implementation includes configuration
read/write operations for both root and child buses. For accessing the
child bus, iATU is used for address translation.

Code common to all DesignWare PCI host controllers is located in a
separate file to allow for easy reuse in other DesignWare-based PCI
host controllers.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* move designware code in a separate file
---
 xen/arch/arm/pci/Makefile |   2 +
 xen/arch/arm/pci/pci-designware.c | 409 ++
 xen/arch/arm/pci/pci-designware.h | 105 
 xen/arch/arm/pci/pci-host-rcar4.c | 104 
 4 files changed, 620 insertions(+)
 create mode 100644 xen/arch/arm/pci/pci-designware.c
 create mode 100644 xen/arch/arm/pci/pci-designware.h
 create mode 100644 xen/arch/arm/pci/pci-host-rcar4.c

diff --git a/xen/arch/arm/pci/Makefile b/xen/arch/arm/pci/Makefile
index 1d045ade01..ca6135e282 100644
--- a/xen/arch/arm/pci/Makefile
+++ b/xen/arch/arm/pci/Makefile
@@ -4,3 +4,5 @@ obj-y += pci-host-generic.o
 obj-y += pci-host-common.o
 obj-y += ecam.o
 obj-y += pci-host-zynqmp.o
+obj-y += pci-designware.o
+obj-y += pci-host-rcar4.o
diff --git a/xen/arch/arm/pci/pci-designware.c 
b/xen/arch/arm/pci/pci-designware.c
new file mode 100644
index 00..6ab03cf9b0
--- /dev/null
+++ b/xen/arch/arm/pci/pci-designware.c
@@ -0,0 +1,409 @@
+/*
+ * Based on Linux drivers/pci/controller/pci-host-common.c
+ * Based on Linux drivers/pci/controller/pci-host-generic.c
+ * Based on xen/arch/arm/pci/pci-host-generic.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+
+#include "pci-designware.h"
+/**
+ * upper_32_bits - return bits 32-63 of a number
+ * @n: the number we're accessing
+ *
+ * A basic shift-right of a 64- or 32-bit quantity.  Use this to suppress
+ * the "right shift count >= width of type" warning when that quantity is
+ * 32-bits.
+ */
+#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
+
+/**
+ * lower_32_bits - return bits 0-31 of a number
+ * @n: the number we're accessing
+ */
+#define lower_32_bits(n) ((uint32_t)((n) & 0x))
+
+static int dw_pcie_read(void __iomem *addr, int size, uint32_t *val)
+{
+if ( !IS_ALIGNED((uintptr_t)addr, size) )
+{
+*val = 0;
+return PCIBIOS_BAD_REGISTER_NUMBER;
+}
+
+if (size == 4)
+*val = readl(addr);
+else if (size == 2)
+*val = readw(addr);
+else if (size == 1)
+*val = readb(addr);
+else
+{
+*val = 0;
+return PCIBIOS_BAD_REGISTER_NUMBER;
+}
+
+return PCIBIOS_SUCCESSFUL;
+}
+
+static int dw_pcie_write(void __iomem *addr, int size, uint32_t val)
+{
+if ( !IS_ALIGNED((uintptr_t)addr, size) )
+return PCIBIOS_BAD_REGISTER_NUMBER;
+
+if (size == 4)
+writel(val, addr);
+else if (size == 2)
+writew(val, addr);
+else if (size == 1)
+writeb(val, addr);
+else
+return PCIBIOS_BAD_REGISTER_NUMBER;
+
+return PCIBIOS_SUCCESSFUL;
+}
+
+static uint32_t dw_pcie_read_dbi(struct pci_host_bridge *bridge,
+   uint32_t reg, size_t size)
+{
+void __iomem *addr = bridge->cfg->win + reg;
+uint32_t val;
+
+dw_pcie_read(addr, size, &val);
+return val;
+}
+
+static void dw_pcie_write_dbi(struct pci_host_bridge *bridge,
+uint32_t reg, size_t size, uint32_t val)
+{
+void __iomem *addr = bridge->cfg->win + reg;
+
+dw_pcie_write(addr, size, val);
+}
+
+static uint32_t dw_pcie_readl_dbi(struct pci_host_bridge *bridge, uint32_t reg)
+{
+return dw_pcie_read_dbi(bridge, reg, sizeof(uint32_t));
+}
+
+static void dw_pcie_writel_dbi(struct pci_host_bridge *pci, uint32_t reg,
+   uint32_t val)
+{
+dw_pcie_write_dbi(pci, reg, sizeof(uint32_t), val);
+}
+
+static void dw_pcie_read_iatu_unroll_enabled(struct pci_host_bridge *bridge)
+{
+struct dw_pcie_priv *priv = bridge->priv;
+uint32_t val;
+
+val = dw_pcie_readl_dbi(bridge, PCIE_ATU_VIEWPORT);
+if (val == 0x)
+priv->iatu_unroll_enabled = true;
+
+printk(XENLOG_DEBUG "%s iATU unroll: %sabled\n",
+   dt_node_full_name(bridge->dt_node),
+   priv->iatu_unroll_enabled ? "en" : "dis");

Re: [PATCH 16/16] CHANGELOG: Mention Xen suspend/resume to RAM feature on arm64

2025-03-11 Thread Oleksii Kurochko



On 3/5/25 10:11 AM, Mykola Kvach wrote:

Signed-off-by: Mykola Kvach
---
  CHANGELOG.md | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 04c21d5bce..489404fc8b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,8 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
  ## [4.21.0 
UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging)
 - TBD
  
  ### Changed

+ - On Arm:
+   - Support for suspend/resume to/from RAM on arm64


With proper Acks for the rest of the patches of this patch series: Oleksii 
Kurochko

~ Oleksii

  
  ### Added

[PATCH v2 6/7] xen/arm: rcar4: add simple optimization to avoid ATU reprogramming

2025-03-11 Thread Mykyta Poturai

From: Volodymyr Babchuk 

There are high chances that there will be a number of a consecutive
accesses to configuration space of one device. To speed things up,
we can program ATU only during first access.

This is mostly beneficial taking into account the previous patch that
adds 1ms delay after ATU reprogramming.

Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* rebased
---
 xen/arch/arm/pci/pci-designware.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/arm/pci/pci-designware.c 
b/xen/arch/arm/pci/pci-designware.c
index def2c12d63..cec52cf81a 100644
--- a/xen/arch/arm/pci/pci-designware.c
+++ b/xen/arch/arm/pci/pci-designware.c
@@ -272,6 +272,14 @@ static void dw_pcie_prog_outbound_atu(struct 
pci_host_bridge *pci, int index,
   int type, uint64_t cpu_addr,
   uint64_t pci_addr, uint64_t size)
 {
+static uint64_t prev_addr = ~0;
+
+/* Simple optimization to not-program ATU for every transaction */
+if (prev_addr == pci_addr)
+return;
+
+prev_addr = pci_addr;
+
 __dw_pcie_prog_outbound_atu(pci, 0, index, type,
 cpu_addr, pci_addr, size);
 }
-- 
2.34.1

Re: [PATCH v8 1/9] xen/events: don't allow binding a global virq from any domain

2025-03-11 Thread Jürgen Groß


On 11.03.25 10:35, Julien Grall wrote:

Hi Juergen,

On 04/02/2025 11:33, Juergen Gross wrote:

Today Xen will happily allow binding a global virq by a domain which
isn't configured to receive it. This won't result in any bad actions,
but the bind will appear to have succeeded with no event ever being
received by that event channel.

Instead of allowing the bind, error out if the domain isn't set to
handle that virq. Note that this check is inside the write_lock() on
purpose, as a future patch will put a related check into
set_global_virq_handler() with the addition of using the same lock.

 > > Signed-off-by: Juergen Gross 

I see this patch was already committed. But I have a question about the logic.


---
V6:
- new patch
V7:
- move handling domain check inside locked region (Jan Beulich)
- style fix (Jan Beulich)
---
  xen/common/event_channel.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 46281b16ce..cd6f5a1211 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -120,6 +120,13 @@ static uint8_t 
get_xen_consumer(xen_event_channel_notification_t fn)

  /* Get the notification function for a given Xen-bound event channel. */
  #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
+static struct domain *__read_mostly global_virq_handlers[NR_VIRQS];
+
+static struct domain *get_global_virq_handler(unsigned int virq)
+{
+    return global_virq_handlers[virq] ?: hardware_domain;
+}
+
  static bool virq_is_global(unsigned int virq)
  {
  switch ( virq )
@@ -469,6 +476,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)

  struct domain *d = current->domain;
  int    virq = bind->virq, vcpu = bind->vcpu;
  int    rc = 0;
+    bool   is_global;
  if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
  return -EINVAL;
@@ -478,8 +486,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)

  * speculative execution.
  */
  virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn));
+    is_global = virq_is_global(virq);
-    if ( virq_is_global(virq) && (vcpu != 0) )
+    if ( is_global && vcpu != 0 )
  return -EINVAL;
  if ( (v = domain_vcpu(d, vcpu)) == NULL )
@@ -487,6 +496,12 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)

  write_lock(&d->event_lock);
+    if ( is_global && get_global_virq_handler(virq) != d )


What prevent a race between get_global_virq_handler() and 
set_global_virq_handler()? Also, it is not clear in the implementation of 
get_global_virq_handler() that it will ever only read global_virq_handlers[virq] 
once.


set_global_virq_handler() is taking the event_lock of the domain
registered as handler.

So if a domain is registered for handling a virq, d->event_lock is
protecting against the handling domain to be changed. Concurrent
calls of set_global_virq_handler() are handled via taking the
global_virq_handlers_lock spin_lock.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

[RFC PATCH v3 4/7] xen/arm: scmi: introduce SCI SCMI SMC multi-agent driver

2025-03-11 Thread Grygorii Strashko

From: Oleksii Moisieiev 

This patch introduces SCI driver to support for ARM EL3 Trusted Firmware-A
(TF-A) which provides SCMI interface with multi-agnet support, as shown
below.

  +-+
  | |
  | EL3 TF-A SCMI   |
  +---+--+---+--+---+--+---++
  |shmem0 |  |shmem1 |  |shmem2 |  |shmemX |
  +-+-+  +---+---+  +--++  +---+---+
smc-id0 || |   |
agent0  || |   |
  +-v+-+---++
  |  | |   ||
  |  | |   ||
  +--+-+---++
 smc-id1 |  smc-id2|smc-idX|
 agent1  |  agent2 |agentX |
 | |   |
+v---+  +--v-+  +--v-+
||  ||  ||
| Dom0   |  | Dom1   |  | DomX   |
||  ||  ||
||  ||  ||
++  ++  ++

The EL3 SCMI multi-agent firmware expected to provide SCMI SMC/HVC shared
memory transport for every Agent in the system.

The SCMI Agent transport channel defined by pair:
 - smc-id: SMC/HVC id used for Doorbell
 - shmem: shared memory for messages transfer, Xen page aligned,
 p2m_mmio_direct_nc.

The follwoing SCMI Agents expected to be defined by SCMI FW to enable SCMI
multi-agent functionality under Xen:
- Xen manegement agent: trusted agents that accesses to the Base Protocol
commands to configure agent specific permissions
- OSPM VM agents: non-trusted agent, one for each Guest domain which is
  allowed direct HW access. At least one OSPM VM agent has to be provided
  by FW if HW is handled only by Dom0 or Driver Domain.

The EL3 SCMI FW expected to implement following Base protocol messages:
- BASE_DISCOVER_AGENT
- BASE_RESET_AGENT_CONFIGURATION (optional)
- BASE_SET_DEVICE_PERMISSIONS (optional)

The SCI SCMI SMC multi-agent driver implements following functionality:
- It's initialized based on the Host DT SCMI node (only one SCMI interface
is supported) which describes Xen management agent SCMI interface.

scmi_shm_0 : sram@47ff {
compatible = "arm,scmi-shmem";
reg = <0x0 0x47ff 0x0 0x1000>;
};
firmware {
scmi: scmi {
compatible = "arm,scmi-smc";
arm,smc-id = <0x8202>; // Xen manegement agent smc-id
\#address-cells = < 1>;
\#size-cells = < 0>;
\#access-controller - cells = < 1>;
shmem = <&scmi_shm_0>; // Xen manegement agent shmem

protocol@X{
};
};
};

- It obtains Xen specific SCMI Agent's configuration from the Host DT,
  probes Agents and build SCMI Agents list. The Agents configuration
  is taken from:

chosen {
  xen,scmi-secondary-agents = <
1 0x8203 &scmi_shm_1
2 0x8204 &scmi_shm_2
3 0x8205 &scmi_shm_3
4 0x8206 &scmi_shm_4>;
}

/{
scmi_shm_1: sram@47ff1000 {
compatible = "arm,scmi-shmem";
reg = <0x0 0x47ff1000 0x0 0x1000>;
};
scmi_shm_2: sram@47ff2000 {
compatible = "arm,scmi-shmem";
reg = <0x0 0x47ff2000 0x0 0x1000>;
};
scmi_shm_3: sram@47ff3000 {
compatible = "arm,scmi-shmem";
reg = <0x0 0x47ff3000 0x0 0x1000>;
};
}
  where first item is "agent_id", second - "arm,smc-id", and third - 
"arm,scmi-shmem" for
  this agent_id.

  Note that Xen is the only one entry in the system which need to know
  about SCMI multi-agent support.

- It implements the SCI subsystem interface required for configuring and
enabling SCMI functionality for Dom0/hwdom and Guest domains. To enable
SCMI functionality for domain it has to be configured with unique supported
SCMI Agent_id and use corresponding SCMI SMC/HVC shared memory transport
[smc-id, shmem] defined for this SCMI Agent_id.
- Once Xen domain is configured it can communicate with EL3 SCMI FW:
  -- zero-copy, the guest domain puts SCMI message in shmem;
  -- the guest triggers SMC/HVC exception with smc-id (doorbell);
  -- the Xen driver catches exception, do checks and synchronously forwards
  it to EL3 FW.
- the Xen driver sends BASE_RESET_AGENT_CONFIGURATION message to Xen
  management agent channel on domain destroy event. This allows to reset
  resources used by domain and so implement use-case like domain reboot.

Dom0 Enable SCMI SMC:
 - pass dom0_scmi_agent_id= in Xen command line option. if not 
provided
   SCMI will be disabled for Dom0 and all SCMI nodes removed from Dom0 DT.
   The driver updates Dom0 DT SCMI node "arm,smc-id" value and fix up shmem
   node according to assigned agent_id.

Guest domains enable SCMI SMC:
 - xl.cfg: add configuration option as below

   arm_sci = "type=scmi_smc_multiagent,agent_id=2"

 - xl.cf

Re: [PATCH v8 1/9] xen/events: don't allow binding a global virq from any domain

2025-03-11 Thread Julien Grall


Hi Juergen,

On 04/02/2025 11:33, Juergen Gross wrote:

Today Xen will happily allow binding a global virq by a domain which
isn't configured to receive it. This won't result in any bad actions,
but the bind will appear to have succeeded with no event ever being
received by that event channel.

Instead of allowing the bind, error out if the domain isn't set to
handle that virq. Note that this check is inside the write_lock() on
purpose, as a future patch will put a related check into
set_global_virq_handler() with the addition of using the same lock.

> > Signed-off-by: Juergen Gross 

I see this patch was already committed. But I have a question about the 
logic.



---
V6:
- new patch
V7:
- move handling domain check inside locked region (Jan Beulich)
- style fix (Jan Beulich)
---
  xen/common/event_channel.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 46281b16ce..cd6f5a1211 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -120,6 +120,13 @@ static uint8_t 
get_xen_consumer(xen_event_channel_notification_t fn)
  /* Get the notification function for a given Xen-bound event channel. */
  #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
  
+static struct domain *__read_mostly global_virq_handlers[NR_VIRQS];

+
+static struct domain *get_global_virq_handler(unsigned int virq)
+{
+return global_virq_handlers[virq] ?: hardware_domain;
+}
+
  static bool virq_is_global(unsigned int virq)
  {
  switch ( virq )
@@ -469,6 +476,7 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)
  struct domain *d = current->domain;
  intvirq = bind->virq, vcpu = bind->vcpu;
  intrc = 0;
+bool   is_global;
  
  if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )

  return -EINVAL;
@@ -478,8 +486,9 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)
  * speculative execution.
  */
  virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn));
+is_global = virq_is_global(virq);
  
-if ( virq_is_global(virq) && (vcpu != 0) )

+if ( is_global && vcpu != 0 )
  return -EINVAL;
  
  if ( (v = domain_vcpu(d, vcpu)) == NULL )

@@ -487,6 +496,12 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, 
evtchn_port_t port)
  
  write_lock(&d->event_lock);
  
+if ( is_global && get_global_virq_handler(virq) != d )


What prevent a race between get_global_virq_handler() and 
set_global_virq_handler()? Also, it is not clear in the implementation 
of get_global_virq_handler() that it will ever only read 
global_virq_handlers[virq] once.


Cheers,

--
Julien Grall

[PATCH v4 04/25] drm/gem-shmem: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 8.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index d99dee67353a..849ee2cde990 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -514,18 +515,11 @@ EXPORT_SYMBOL(drm_gem_shmem_purge);
 int drm_gem_shmem_dumb_create(struct drm_file *file, struct drm_device *dev,
  struct drm_mode_create_dumb *args)
 {
-   u32 min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+   int ret;
 
-   if (!args->pitch || !args->size) {
-   args->pitch = min_pitch;
-   args->size = PAGE_ALIGN(args->pitch * args->height);
-   } else {
-   /* ensure sane minimum values */
-   if (args->pitch < min_pitch)
-   args->pitch = min_pitch;
-   if (args->size < args->pitch * args->height)
-   args->size = PAGE_ALIGN(args->pitch * args->height);
-   }
+   ret = drm_mode_size_dumb(dev, args, SZ_8, 0);
+   if (ret)
+   return ret;
 
return drm_gem_shmem_create_with_handle(file, dev, args->size, 
&args->handle);
 }
-- 
2.48.1

[PATCH v4 00/25] drm/dumb-buffers: Fix and improve buffer-size calculation

2025-03-11 Thread Thomas Zimmermann

Dumb-buffer pitch and size is specified by width, height, bits-per-pixel
plus various hardware-specific alignments. The calculation of these
values is inconsistent and duplicated among drivers. The results for
formats with bpp < 8 are sometimes incorrect.

This series fixes this for most drivers. Default scanline pitch and
buffer size are now calculated with the existing 4CC helpers. There is
a new helper drm_mode_size_dumb() that calculates scanline pitch and
buffer size according to driver requirements.

The series fixes the common GEM implementations for DMA, SHMEM and
VRAM. It further changes most implementations of dumb_create to use
the new helper. A small number of drivers has more complicated
calculations and will be updated by a later patches.

v4:
- improve UAPI documentation
- document bpp special cases
- use drm_warn_once()
- add TODO lists
- armada: fix pitch alignment
v3:
- document UAPI semantics
- fall back to bpp-based allocation for unknown color modes
- cleanups
v2:
- rewrite series
- convert many individual drivers besides the shared GEM helpers

Thomas Zimmermann (25):
  drm/dumb-buffers: Sanitize output on errors
  drm/dumb-buffers: Provide helper to set pitch and size
  drm/gem-dma: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/gem-shmem: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/gem-vram: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/armada: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/exynos: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/gma500: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/hibmc: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/imx/ipuv3: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/loongson: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/mediatek: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/msm: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/nouveau: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/omapdrm: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/qxl: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/renesas/rcar-du: Compute dumb-buffer sizes with
drm_mode_size_dumb()
  drm/renesas/rz-du: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/rockchip: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/tegra: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/virtio: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/vmwgfx: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/xe: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/xen: Compute dumb-buffer sizes with drm_mode_size_dumb()
  drm/xlnx: Compute dumb-buffer sizes with drm_mode_size_dumb()

 Documentation/gpu/todo.rst|  28 +++
 drivers/gpu/drm/armada/armada_gem.c   |  16 +-
 drivers/gpu/drm/drm_dumb_buffers.c| 172 --
 drivers/gpu/drm/drm_gem_dma_helper.c  |   7 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c|  16 +-
 drivers/gpu/drm/drm_gem_vram_helper.c |  89 +++--
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |   8 +-
 drivers/gpu/drm/gma500/gem.c  |  21 +--
 .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   |  25 ++-
 drivers/gpu/drm/imx/ipuv3/imx-drm-core.c  |  29 ++-
 drivers/gpu/drm/loongson/lsdc_gem.c   |  29 +--
 drivers/gpu/drm/mediatek/mtk_gem.c|  13 +-
 drivers/gpu/drm/msm/msm_gem.c |  27 ++-
 drivers/gpu/drm/nouveau/nouveau_display.c |   7 +-
 drivers/gpu/drm/omapdrm/omap_gem.c|  15 +-
 drivers/gpu/drm/qxl/qxl_dumb.c|  17 +-
 drivers/gpu/drm/renesas/rcar-du/rcar_du_kms.c |   7 +-
 drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c  |   7 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  12 +-
 drivers/gpu/drm/tegra/gem.c   |   8 +-
 drivers/gpu/drm/virtio/virtgpu_gem.c  |  11 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c   |  21 +--
 drivers/gpu/drm/xe/xe_bo.c|   8 +-
 drivers/gpu/drm/xen/xen_drm_front.c   |   7 +-
 drivers/gpu/drm/xlnx/zynqmp_kms.c |   7 +-
 include/drm/drm_dumb_buffers.h|  14 ++
 include/drm/drm_gem_vram_helper.h |   6 -
 include/uapi/drm/drm_mode.h   |  50 -
 28 files changed, 449 insertions(+), 228 deletions(-)
 create mode 100644 include/drm/drm_dumb_buffers.h

-- 
2.48.1

[PATCH v4 09/25] drm/hibmc: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 128.

The hibmc driver's new hibmc_dumb_create() is similar to the one
in GEM VRAM helpers. The driver was the only caller of
drm_gem_vram_fill_create_dumb(). Remove the now unused helper.

Signed-off-by: Thomas Zimmermann 
Cc: Xinliang Liu 
Cc: Tian Tao 
Cc: Xinwei Kong 
Cc: Sumit Semwal 
Cc: Yongqin Liu 
Cc: John Stultz 
---
 drivers/gpu/drm/drm_gem_vram_helper.c | 65 ---
 .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 25 ++-
 include/drm/drm_gem_vram_helper.h |  6 --
 3 files changed, 24 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index 15cd564cbeac..b4cf8134df6d 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -444,71 +444,6 @@ void drm_gem_vram_vunmap(struct drm_gem_vram_object *gbo,
 }
 EXPORT_SYMBOL(drm_gem_vram_vunmap);
 
-/**
- * drm_gem_vram_fill_create_dumb() - Helper for implementing
- *  &struct drm_driver.dumb_create
- *
- * @file:  the DRM file
- * @dev:   the DRM device
- * @pg_align:  the buffer's alignment in multiples of the page size
- * @pitch_align:   the scanline's alignment in powers of 2
- * @args:  the arguments as provided to
- * &struct drm_driver.dumb_create
- *
- * This helper function fills &struct drm_mode_create_dumb, which is used
- * by &struct drm_driver.dumb_create. Implementations of this interface
- * should forwards their arguments to this helper, plus the driver-specific
- * parameters.
- *
- * Returns:
- * 0 on success, or
- * a negative error code otherwise.
- */
-int drm_gem_vram_fill_create_dumb(struct drm_file *file,
- struct drm_device *dev,
- unsigned long pg_align,
- unsigned long pitch_align,
- struct drm_mode_create_dumb *args)
-{
-   size_t pitch, size;
-   struct drm_gem_vram_object *gbo;
-   int ret;
-   u32 handle;
-
-   pitch = args->width * DIV_ROUND_UP(args->bpp, 8);
-   if (pitch_align) {
-   if (WARN_ON_ONCE(!is_power_of_2(pitch_align)))
-   return -EINVAL;
-   pitch = ALIGN(pitch, pitch_align);
-   }
-   size = pitch * args->height;
-
-   size = roundup(size, PAGE_SIZE);
-   if (!size)
-   return -EINVAL;
-
-   gbo = drm_gem_vram_create(dev, size, pg_align);
-   if (IS_ERR(gbo))
-   return PTR_ERR(gbo);
-
-   ret = drm_gem_handle_create(file, &gbo->bo.base, &handle);
-   if (ret)
-   goto err_drm_gem_object_put;
-
-   drm_gem_object_put(&gbo->bo.base);
-
-   args->pitch = pitch;
-   args->size = size;
-   args->handle = handle;
-
-   return 0;
-
-err_drm_gem_object_put:
-   drm_gem_object_put(&gbo->bo.base);
-   return ret;
-}
-EXPORT_SYMBOL(drm_gem_vram_fill_create_dumb);
-
 /*
  * Helpers for struct ttm_device_funcs
  */
diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
index e6de6d5edf6b..81768577871f 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
@@ -18,10 +18,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -54,7 +56,28 @@ static irqreturn_t hibmc_interrupt(int irq, void *arg)
 static int hibmc_dumb_create(struct drm_file *file, struct drm_device *dev,
 struct drm_mode_create_dumb *args)
 {
-   return drm_gem_vram_fill_create_dumb(file, dev, 0, 128, args);
+   struct drm_gem_vram_object *gbo;
+   int ret;
+
+   ret = drm_mode_size_dumb(dev, args, SZ_128, 0);
+   if (ret)
+   return ret;
+
+   gbo = drm_gem_vram_create(dev, args->size, 0);
+   if (IS_ERR(gbo))
+   return PTR_ERR(gbo);
+
+   ret = drm_gem_handle_create(file, &gbo->bo.base, &args->handle);
+   if (ret)
+   goto err_drm_gem_object_put;
+
+   drm_gem_object_put(&gbo->bo.base);
+
+   return 0;
+
+err_drm_gem_object_put:
+   drm_gem_object_put(&gbo->bo.base);
+   return ret;
 }
 
 static const struct drm_driver hibmc_driver = {
diff --git a/include/drm/drm_gem_vram_helper.h 
b/include/drm/drm_gem_vram_helper.h
index 00830b49a3ff..b6e821f5dd03 100644
--- a/include/drm/drm_gem_vram_helper.h
+++ b/include/drm/drm_gem_vram_helper.h
@@ -100,12 +100,6 @@ int drm_gem_vram_vmap(struct drm_gem_vram_object *gbo, 
struct iosys_map *map);
 void drm_gem_vram_vunmap(struct drm_gem_vram_object *gbo,
 struct iosys_map *map);
 
-int drm_gem_vram_fill_create_dumb(struct drm_file *file,
-

[PATCH v4 22/25] drm/vmwgfx: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch
and buffer size. No alignment required.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Zack Rusin 
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 21 -
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 02ab65cc63ec..c3a4ea713559 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -15,6 +15,7 @@
 #include "vmw_surface_cache.h"
 #include "device_include/svga3d_surfacedefs.h"
 
+#include 
 #include 
 
 #define SVGA3D_FLAGS_64(upper32, lower32) (((uint64_t)upper32 << 32) | lower32)
@@ -2270,23 +2271,9 @@ int vmw_dumb_create(struct drm_file *file_priv,
 * contents is going to be rendered guest side.
 */
if (!dev_priv->has_mob || !vmw_supports_3d(dev_priv)) {
-   int cpp = DIV_ROUND_UP(args->bpp, 8);
-
-   switch (cpp) {
-   case 1: /* DRM_FORMAT_C8 */
-   case 2: /* DRM_FORMAT_RGB565 */
-   case 4: /* DRM_FORMAT_XRGB */
-   break;
-   default:
-   /*
-* Dumb buffers don't allow anything else.
-* This is tested via IGT's dumb_buffers
-*/
-   return -EINVAL;
-   }
-
-   args->pitch = args->width * cpp;
-   args->size = ALIGN(args->pitch * args->height, PAGE_SIZE);
+   ret = drm_mode_size_dumb(dev, args, 0, 0);
+   if (ret)
+   return ret;
 
ret = vmw_gem_object_create_with_handle(dev_priv, file_priv,
args->size, 
&args->handle,
-- 
2.48.1

[PATCH v4 24/25] drm/xen: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch
and buffer size. Align the pitch to a multiple of 8.

Signed-off-by: Thomas Zimmermann 
Cc: Oleksandr Andrushchenko 
---
 drivers/gpu/drm/xen/xen_drm_front.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front.c 
b/drivers/gpu/drm/xen/xen_drm_front.c
index 1bda7ef606cc..fd2f250fbc33 100644
--- a/drivers/gpu/drm/xen/xen_drm_front.c
+++ b/drivers/gpu/drm/xen/xen_drm_front.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -414,8 +415,10 @@ static int xen_drm_drv_dumb_create(struct drm_file *filp,
 * object without pages etc.
 * For details also see drm_gem_handle_create
 */
-   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-   args->size = args->pitch * args->height;
+
+   ret = drm_mode_size_dumb(dev, args, SZ_8, 0);
+   if (ret)
+   return ret;
 
obj = xen_drm_front_gem_create(dev, args->size);
if (IS_ERR(obj)) {
-- 
2.48.1

[PATCH v4 03/25] drm/gem-dma: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 8.

Push the current calculation into the only direct caller imx. Imx's
hardware requires the framebuffer width to be aligned to 8. The
driver's current approach is actually incorrect, as it only guarantees
this implicitly and requires bpp to be a multiple of 8 already. A
later commit will fix this problem by aligning the scanline pitch
such that an aligned width still fits into each scanline's memory.

A number of other drivers are build on top of gem-dma helpers and
implement their own dumb-buffer allocation. These drivers invoke
drm_gem_dma_dumb_create_internal(), which is not affected by this
commit.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem_dma_helper.c | 7 +--
 drivers/gpu/drm/imx/ipuv3/imx-drm-core.c | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c 
b/drivers/gpu/drm/drm_gem_dma_helper.c
index b7f033d4352a..49be9b033610 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -304,9 +305,11 @@ int drm_gem_dma_dumb_create(struct drm_file *file_priv,
struct drm_mode_create_dumb *args)
 {
struct drm_gem_dma_object *dma_obj;
+   int ret;
 
-   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-   args->size = args->pitch * args->height;
+   ret = drm_mode_size_dumb(drm, args, SZ_8, 0);
+   if (ret)
+   return ret;
 
dma_obj = drm_gem_dma_create_with_handle(file_priv, drm, args->size,
 &args->handle);
diff --git a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c 
b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
index ec5fd9a01f1e..e7025df7b978 100644
--- a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
+++ b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
@@ -145,6 +145,8 @@ static int imx_drm_dumb_create(struct drm_file *file_priv,
int ret;
 
args->width = ALIGN(width, 8);
+   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+   args->size = args->pitch * args->height;
 
ret = drm_gem_dma_dumb_create(file_priv, drm, args);
if (ret)
-- 
2.48.1

[PATCH v4 13/25] drm/msm: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch
and buffer size. Alignment is specified in bytes, but the hardware
requires the scanline pitch to be a multiple of 32 pixels. Therefore
compute the byte size of 32 pixels in the given color mode and align
the pitch accordingly. This replaces the existing code in the driver's
align_pitch() helper.

v3:
- clarify pitch alignment in commit message (Dmitry)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Dmitry Baryshkov 
Cc: Rob Clark 
Cc: Abhinav Kumar 
Cc: Dmitry Baryshkov 
Cc: Sean Paul 
Cc: Marijn Suijten 
---
 drivers/gpu/drm/msm/msm_gem.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index ebc9ba66efb8..a956905f1ef2 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -11,8 +11,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -700,8 +702,29 @@ void msm_gem_unpin_iova(struct drm_gem_object *obj,
 int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
struct drm_mode_create_dumb *args)
 {
-   args->pitch = align_pitch(args->width, args->bpp);
-   args->size  = PAGE_ALIGN(args->pitch * args->height);
+   u32 fourcc;
+   const struct drm_format_info *info;
+   u64 pitch_align;
+   int ret;
+
+   /*
+* Adreno needs pitch aligned to 32 pixels. Compute the number
+* of bytes for a block of 32 pixels at the given color format.
+* Use the result as pitch alignment.
+*/
+   fourcc = drm_driver_color_mode_format(dev, args->bpp);
+   if (fourcc == DRM_FORMAT_INVALID)
+   return -EINVAL;
+   info = drm_format_info(fourcc);
+   if (!info)
+   return -EINVAL;
+   pitch_align = drm_format_info_min_pitch(info, 0, SZ_32);
+   if (!pitch_align || pitch_align > U32_MAX)
+   return -EINVAL;
+   ret = drm_mode_size_dumb(dev, args, pitch_align, 0);
+   if (ret)
+   return ret;
+
return msm_gem_new_handle(dev, file, args->size,
MSM_BO_SCANOUT | MSM_BO_WC, &args->handle, "dumb");
 }
-- 
2.48.1

[PATCH v4 10/25] drm/imx/ipuv3: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. The hardware requires the framebuffer width to be a
multiple of 8. The scanline pitch has be large enough to support
this. Therefore compute the byte size of 8 pixels in the given color
mode and align the pitch accordingly.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Philipp Zabel 
Cc: Philipp Zabel 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
---
 drivers/gpu/drm/imx/ipuv3/imx-drm-core.c | 31 ++--
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c 
b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
index e7025df7b978..465b5a6ad5bb 100644
--- a/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
+++ b/drivers/gpu/drm/imx/ipuv3/imx-drm-core.c
@@ -17,7 +17,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -141,19 +143,32 @@ static int imx_drm_dumb_create(struct drm_file *file_priv,
   struct drm_device *drm,
   struct drm_mode_create_dumb *args)
 {
-   u32 width = args->width;
+   u32 fourcc;
+   const struct drm_format_info *info;
+   u64 pitch_align;
int ret;
 
-   args->width = ALIGN(width, 8);
-   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-   args->size = args->pitch * args->height;
-
-   ret = drm_gem_dma_dumb_create(file_priv, drm, args);
+   /*
+* Hardware requires the framebuffer width to be aligned to
+* multiples of 8. The mode-setting code handles this, but
+* the buffer pitch has to be aligned as well. Set the pitch
+* alignment accordingly, so that the each scanline fits into
+* the allocated buffer.
+*/
+   fourcc = drm_driver_color_mode_format(drm, args->bpp);
+   if (fourcc == DRM_FORMAT_INVALID)
+   return -EINVAL;
+   info = drm_format_info(fourcc);
+   if (!info)
+   return -EINVAL;
+   pitch_align = drm_format_info_min_pitch(info, 0, SZ_8);
+   if (!pitch_align || pitch_align > U32_MAX)
+   return -EINVAL;
+   ret = drm_mode_size_dumb(drm, args, pitch_align, 0);
if (ret)
return ret;
 
-   args->width = width;
-   return ret;
+   return drm_gem_dma_dumb_create(file_priv, drm, args);
 }
 
 static const struct drm_driver imx_drm_driver = {
-- 
2.48.1

[PATCH v4 19/25] drm/rockchip: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 64.

Signed-off-by: Thomas Zimmermann 
Acked-by: Heiko Stuebner 
Cc: Sandy Huang 
Cc: "Heiko Stübner" 
Cc: Andy Yan 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 6330b883efc3..3bd06202e232 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -403,13 +404,12 @@ int rockchip_gem_dumb_create(struct drm_file *file_priv,
 struct drm_mode_create_dumb *args)
 {
struct rockchip_gem_object *rk_obj;
-   int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+   int ret;
 
-   /*
-* align to 64 bytes since Mali requires it.
-*/
-   args->pitch = ALIGN(min_pitch, 64);
-   args->size = args->pitch * args->height;
+   /* 64-byte alignment required by Mali */
+   ret = drm_mode_size_dumb(dev, args, SZ_64, 0);
+   if (ret)
+   return ret;
 
rk_obj = rockchip_gem_create_with_handle(file_priv, dev, args->size,
 &args->handle);
-- 
2.48.1

[PATCH v4 15/25] drm/omapdrm: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 8.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Tomi Valkeinen 
Cc: Tomi Valkeinen 
---
 drivers/gpu/drm/omapdrm/omap_gem.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
b/drivers/gpu/drm/omapdrm/omap_gem.c
index b9c67e4ca360..b8413a2dcdeb 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -583,15 +584,13 @@ static int omap_gem_object_mmap(struct drm_gem_object 
*obj, struct vm_area_struc
 int omap_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
struct drm_mode_create_dumb *args)
 {
-   union omap_gem_size gsize;
-
-   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-
-   args->size = PAGE_ALIGN(args->pitch * args->height);
+   union omap_gem_size gsize = { };
+   int ret;
 
-   gsize = (union omap_gem_size){
-   .bytes = args->size,
-   };
+   ret = drm_mode_size_dumb(dev, args, SZ_8, 0);
+   if (ret)
+   return ret;
+   gsize.bytes = args->size;
 
return omap_gem_new_handle(dev, file, gsize,
OMAP_BO_SCANOUT | OMAP_BO_WC, &args->handle);
-- 
2.48.1

[PATCH v4 18/25] drm/renesas/rz-du: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch according to hardware requirements.

Signed-off-by: Thomas Zimmermann 
Cc: Biju Das 
---
 drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c 
b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
index 90c6269ccd29..f752369cd52f 100644
--- a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
+++ b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
@@ -75,10 +75,11 @@ const struct rzg2l_du_format_info *rzg2l_du_format_info(u32 
fourcc)
 int rzg2l_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 struct drm_mode_create_dumb *args)
 {
-   unsigned int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-   unsigned int align = 16 * args->bpp / 8;
+   int ret;
 
-   args->pitch = roundup(min_pitch, align);
+   ret = drm_mode_size_dumb(dev, args, 16 * args->bpp / 8, 0);
+   if (ret)
+   return ret;
 
return drm_gem_dma_dumb_create_internal(file, dev, args);
 }
-- 
2.48.1

[PATCH v4 20/25] drm/tegra: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch according to hardware requirements.

Signed-off-by: Thomas Zimmermann 
Acked-by: Thierry Reding 
Cc: Thierry Reding 
Cc: Mikko Perttunen 
---
 drivers/gpu/drm/tegra/gem.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index ace3e5a805cf..4d88e71192fb 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -16,6 +16,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -543,12 +544,13 @@ void tegra_bo_free_object(struct drm_gem_object *gem)
 int tegra_bo_dumb_create(struct drm_file *file, struct drm_device *drm,
 struct drm_mode_create_dumb *args)
 {
-   unsigned int min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
struct tegra_drm *tegra = drm->dev_private;
struct tegra_bo *bo;
+   int ret;
 
-   args->pitch = round_up(min_pitch, tegra->pitch_align);
-   args->size = args->pitch * args->height;
+   ret = drm_mode_size_dumb(drm, args, tegra->pitch_align, 0);
+   if (ret)
+   return ret;
 
bo = tegra_bo_create_with_handle(file, drm, args->size, 0,
 &args->handle);
-- 
2.48.1

[PATCH v4 11/25] drm/loongson: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch according to hardware requirements.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Sui Jingfeng 
Cc: Sui Jingfeng 
---
 drivers/gpu/drm/loongson/lsdc_gem.c | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/loongson/lsdc_gem.c 
b/drivers/gpu/drm/loongson/lsdc_gem.c
index a720d8f53209..9f982b85301f 100644
--- a/drivers/gpu/drm/loongson/lsdc_gem.c
+++ b/drivers/gpu/drm/loongson/lsdc_gem.c
@@ -6,6 +6,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -204,45 +205,31 @@ int lsdc_dumb_create(struct drm_file *file, struct 
drm_device *ddev,
const struct lsdc_desc *descp = ldev->descp;
u32 domain = LSDC_GEM_DOMAIN_VRAM;
struct drm_gem_object *gobj;
-   size_t size;
-   u32 pitch;
-   u32 handle;
int ret;
 
-   if (!args->width || !args->height)
-   return -EINVAL;
-
-   if (args->bpp != 32 && args->bpp != 16)
-   return -EINVAL;
-
-   pitch = args->width * args->bpp / 8;
-   pitch = ALIGN(pitch, descp->pitch_align);
-   size = pitch * args->height;
-   size = ALIGN(size, PAGE_SIZE);
+   ret = drm_mode_size_dumb(ddev, args, descp->pitch_align, 0);
+   if (ret)
+   return ret;
 
/* Maximum single bo size allowed is the half vram size available */
-   if (size > ldev->vram_size / 2) {
-   drm_err(ddev, "Requesting(%zuMiB) failed\n", size >> 20);
+   if (args->size > ldev->vram_size / 2) {
+   drm_err(ddev, "Requesting(%zuMiB) failed\n", 
(size_t)(args->size >> PAGE_SHIFT));
return -ENOMEM;
}
 
-   gobj = lsdc_gem_object_create(ddev, domain, size, false, NULL, NULL);
+   gobj = lsdc_gem_object_create(ddev, domain, args->size, false, NULL, 
NULL);
if (IS_ERR(gobj)) {
drm_err(ddev, "Failed to create gem object\n");
return PTR_ERR(gobj);
}
 
-   ret = drm_gem_handle_create(file, gobj, &handle);
+   ret = drm_gem_handle_create(file, gobj, &args->handle);
 
/* drop reference from allocate, handle holds it now */
drm_gem_object_put(gobj);
if (ret)
return ret;
 
-   args->pitch = pitch;
-   args->size = size;
-   args->handle = handle;
-
return 0;
 }
 
-- 
2.48.1

[PATCH v4 05/25] drm/gem-vram: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Inline code from drm_gem_vram_fill_create_dumb() without
the existing size computation. Align the pitch to a multiple of 8.

Only hibmc and vboxvideo use gem-vram. Hibmc invokes the call to
drm_gem_vram_fill_create_dumb() directly and is therefore not affected.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem_vram_helper.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index 22b1fe9c03b8..15cd564cbeac 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -582,10 +583,31 @@ int drm_gem_vram_driver_dumb_create(struct drm_file *file,
struct drm_device *dev,
struct drm_mode_create_dumb *args)
 {
+   struct drm_gem_vram_object *gbo;
+   int ret;
+
if (WARN_ONCE(!dev->vram_mm, "VRAM MM not initialized"))
return -EINVAL;
 
-   return drm_gem_vram_fill_create_dumb(file, dev, 0, 0, args);
+   ret = drm_mode_size_dumb(dev, args, SZ_8, 0);
+   if (ret)
+   return ret;
+
+   gbo = drm_gem_vram_create(dev, args->size, 0);
+   if (IS_ERR(gbo))
+   return PTR_ERR(gbo);
+
+   ret = drm_gem_handle_create(file, &gbo->bo.base, &args->handle);
+   if (ret)
+   goto err_drm_gem_object_put;
+
+   drm_gem_object_put(&gbo->bo.base);
+
+   return 0;
+
+err_drm_gem_object_put:
+   drm_gem_object_put(&gbo->bo.base);
+   return ret;
 }
 EXPORT_SYMBOL(drm_gem_vram_driver_dumb_create);
 
-- 
2.48.1

[PATCH v4 23/25] drm/xe: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch
and buffer size. Align the pitch to a multiple of 8. Align the
buffer size according to hardware requirements.

Xe's internal calculation allowed for 64-bit wide buffer sizes, but
the ioctl's internal checks always verified against 32-bit wide limits.
Hance, it is safe to limit the driver code to 32-bit calculations as
well.

v3:
- mention 32-bit calculation in commit description (Matthew)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Matthew Auld 
Cc: Lucas De Marchi 
Cc: "Thomas Hellström" 
Cc: Rodrigo Vivi 
---
 drivers/gpu/drm/xe/xe_bo.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 64f9c936eea0..471aab61176e 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2910,14 +2911,13 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
struct xe_device *xe = to_xe_device(dev);
struct xe_bo *bo;
uint32_t handle;
-   int cpp = DIV_ROUND_UP(args->bpp, 8);
int err;
u32 page_size = max_t(u32, PAGE_SIZE,
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K);
 
-   args->pitch = ALIGN(args->width * cpp, 64);
-   args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
-  page_size);
+   err = drm_mode_size_dumb(dev, args, SZ_64, page_size);
+   if (err)
+   return err;
 
bo = xe_bo_create_user(xe, NULL, NULL, args->size,
   DRM_XE_GEM_CPU_CACHING_WC,
-- 
2.48.1

[PATCH v4 14/25] drm/nouveau: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 256.

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Lyude Paul 
Cc: Karol Herbst 
Cc: Lyude Paul 
Cc: Danilo Krummrich 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index add006fc8d81..daa2528f9c9a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -808,9 +809,9 @@ nouveau_display_dumb_create(struct drm_file *file_priv, 
struct drm_device *dev,
uint32_t domain;
int ret;
 
-   args->pitch = roundup(args->width * (args->bpp / 8), 256);
-   args->size = args->pitch * args->height;
-   args->size = roundup(args->size, PAGE_SIZE);
+   ret = drm_mode_size_dumb(dev, args, SZ_256, 0);
+   if (ret)
+   return ret;
 
/* Use VRAM if there is any ; otherwise fallback to system memory */
if (nouveau_drm(dev)->client.device.info.ram_size != 0)
-- 
2.48.1

[PATCH v4 12/25] drm/mediatek: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 8.

Signed-off-by: Thomas Zimmermann 
Cc: Chun-Kuang Hu 
Cc: Philipp Zabel 
Cc: Matthias Brugger 
Cc: AngeloGioacchino Del Regno 
---
 drivers/gpu/drm/mediatek/mtk_gem.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_gem.c 
b/drivers/gpu/drm/mediatek/mtk_gem.c
index a172456d1d7b..21e08fabfd7f 100644
--- a/drivers/gpu/drm/mediatek/mtk_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_gem.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -124,15 +125,9 @@ int mtk_gem_dumb_create(struct drm_file *file_priv, struct 
drm_device *dev,
struct mtk_gem_obj *mtk_gem;
int ret;
 
-   args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
-
-   /*
-* Multiply 2 variables of different types,
-* for example: args->size = args->spacing * args->height;
-* may cause coverity issue with unintentional overflow.
-*/
-   args->size = args->pitch;
-   args->size *= args->height;
+   ret = drm_mode_size_dumb(dev, args, SZ_8, 0);
+   if (ret)
+   return ret;
 
mtk_gem = mtk_gem_create(dev, args->size, false);
if (IS_ERR(mtk_gem))
-- 
2.48.1

[PATCH v4 07/25] drm/exynos: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. No alignment required.

Signed-off-by: Thomas Zimmermann 
Cc: Inki Dae 
Cc: Seung-Woo Kim 
Cc: Kyungmin Park 
Cc: Krzysztof Kozlowski 
Cc: Alim Akhtar 
---
 drivers/gpu/drm/exynos/exynos_drm_gem.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 4787fee4696f..ffa1c02b4b1e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -330,15 +331,16 @@ int exynos_drm_gem_dumb_create(struct drm_file *file_priv,
unsigned int flags;
int ret;
 
+   ret = drm_mode_size_dumb(dev, args, 0, 0);
+   if (ret)
+   return ret;
+
/*
 * allocate memory to be used for framebuffer.
 * - this callback would be called by user application
 *  with DRM_IOCTL_MODE_CREATE_DUMB command.
 */
 
-   args->pitch = args->width * ((args->bpp + 7) / 8);
-   args->size = args->pitch * args->height;
-
if (is_drm_iommu_supported(dev))
flags = EXYNOS_BO_NONCONTIG | EXYNOS_BO_WC;
else
-- 
2.48.1

Re: [PATCH v1] xen/riscv: add H extenstion to -march

2025-03-11 Thread Oleksii Kurochko



On 3/11/25 4:48 PM, Jan Beulich wrote:

On 11.03.2025 16:45, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -9,7 +9,8 @@ riscv-abi-$(CONFIG_RISCV_64) := -mabi=lp64
  riscv-march-$(CONFIG_RISCV_64) := rv64
  riscv-march-y += ima
  riscv-march-$(CONFIG_RISCV_ISA_C) += c
-riscv-march-y += _zicsr_zifencei_zbb
+h-extension-name := $(call cc-ifversion,-lt,1301, hh, h)

Instead of a version check, did you consider probing the compiler? With the
hard-coded version, how are things going to work with Clang?


Initially, it was implemented using:

+$(h-extension-name)-insn := "hfence.gvma"
+$(call check-extension,$(h-extension-name))

with

+h-extension-name := $(call cc-ifversion,-lt,1301, hh, h)

But it seems that we still need this hard-coded version for h-extension-name
because of this behavior that h extensions should be longer then single letter
for some compilers.

Probably, we could consider option to check "hfence.gvma" twice:
h-insn := "hfence.gvma"
$(call check-extension,h)

hh-insn := "hfence.gvma"
$(call check-extension,hh)

And filter-out/skip the second check-extension if the previous one check 
succeeded.
But it looks a weird.

~ Oleksii

[PATCH v4 06/25] drm/armada: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 128.

v4:
- align pitch to 128 bytes (Russell)

Signed-off-by: Thomas Zimmermann 
Cc: Russell King 
---
 drivers/gpu/drm/armada/armada_gem.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index 1a1680d71486..a767c4edd5c8 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include "armada_drm.h"
@@ -244,14 +245,13 @@ int armada_gem_dumb_create(struct drm_file *file, struct 
drm_device *dev,
struct drm_mode_create_dumb *args)
 {
struct armada_gem_object *dobj;
-   u32 handle;
-   size_t size;
int ret;
 
-   args->pitch = armada_pitch(args->width, args->bpp);
-   args->size = size = args->pitch * args->height;
+   ret = drm_mode_size_dumb(dev, args, SZ_128, 0);
+   if (ret)
+   return ret;
 
-   dobj = armada_gem_alloc_private_object(dev, size);
+   dobj = armada_gem_alloc_private_object(dev, args->size);
if (dobj == NULL)
return -ENOMEM;
 
@@ -259,14 +259,12 @@ int armada_gem_dumb_create(struct drm_file *file, struct 
drm_device *dev,
if (ret)
goto err;
 
-   ret = drm_gem_handle_create(file, &dobj->obj, &handle);
+   ret = drm_gem_handle_create(file, &dobj->obj, &args->handle);
if (ret)
goto err;
 
-   args->handle = handle;
-
/* drop reference from allocate - handle holds it now */
-   DRM_DEBUG_DRIVER("obj %p size %zu handle %#x\n", dobj, size, handle);
+   DRM_DEBUG_DRIVER("obj %p size %llu handle %#x\n", dobj, args->size, 
args->handle);
  err:
drm_gem_object_put(&dobj->obj);
return ret;
-- 
2.48.1

[PATCH v4 16/25] drm/qxl: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch
and buffer size. No alignment required.

Signed-off-by: Thomas Zimmermann 
Cc: Dave Airlie 
Cc: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_dumb.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_dumb.c b/drivers/gpu/drm/qxl/qxl_dumb.c
index 17df5c7ccf69..1200946767ce 100644
--- a/drivers/gpu/drm/qxl/qxl_dumb.c
+++ b/drivers/gpu/drm/qxl/qxl_dumb.c
@@ -23,6 +23,8 @@
  *  Alon Levy
  */
 
+#include 
+
 #include "qxl_drv.h"
 #include "qxl_object.h"
 
@@ -35,14 +37,13 @@ int qxl_mode_dumb_create(struct drm_file *file_priv,
struct qxl_device *qdev = to_qxl(dev);
struct qxl_bo *qobj;
struct drm_gem_object *gobj;
-   uint32_t handle;
int r;
struct qxl_surface surf;
-   uint32_t pitch, format;
+   u32 format;
 
-   pitch = args->width * ((args->bpp + 1) / 8);
-   args->size = pitch * args->height;
-   args->size = ALIGN(args->size, PAGE_SIZE);
+   r = drm_mode_size_dumb(dev, args, 0, 0);
+   if (r)
+   return r;
 
switch (args->bpp) {
case 16:
@@ -57,20 +58,18 @@ int qxl_mode_dumb_create(struct drm_file *file_priv,
 
surf.width = args->width;
surf.height = args->height;
-   surf.stride = pitch;
+   surf.stride = args->pitch;
surf.format = format;
surf.data = 0;
 
r = qxl_gem_object_create_with_handle(qdev, file_priv,
  QXL_GEM_DOMAIN_CPU,
  args->size, &surf, &gobj,
- &handle);
+ &args->handle);
if (r)
return r;
qobj = gem_to_qxl_bo(gobj);
qobj->is_dumb = true;
drm_gem_object_put(gobj);
-   args->pitch = pitch;
-   args->handle = handle;
return 0;
 }
-- 
2.48.1

Re: [PATCH v1] xen/riscv: add H extenstion to -march

2025-03-11 Thread Jan Beulich

On 11.03.2025 16:45, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/arch.mk
> +++ b/xen/arch/riscv/arch.mk
> @@ -9,7 +9,8 @@ riscv-abi-$(CONFIG_RISCV_64) := -mabi=lp64
>  riscv-march-$(CONFIG_RISCV_64) := rv64
>  riscv-march-y += ima
>  riscv-march-$(CONFIG_RISCV_ISA_C) += c
> -riscv-march-y += _zicsr_zifencei_zbb
> +h-extension-name := $(call cc-ifversion,-lt,1301, hh, h)

Instead of a version check, did you consider probing the compiler? With the
hard-coded version, how are things going to work with Clang?

Jan

Re: [PATCH v4 1/3] x86/vmx: fix posted interrupts usage of msi_desc->msg field

2025-03-11 Thread Jan Beulich

On 11.03.2025 13:06, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -396,6 +396,13 @@ static int cf_check vmx_pi_update_irte(const struct vcpu 
> *v,
>  const struct pi_desc *pi_desc = v ? &v->arch.hvm.vmx.pi_desc : NULL;
>  struct irq_desc *desc;
>  struct msi_desc *msi_desc;
> +/*
> + * vmx_pi_update_irte() relies on the IRTE already being setup, and just
> + * updates the guest vector, but not the other IRTE fields.  As such the
> + * contents of msg are not consumed by iommu_update_ire_from_msi().  Even
> + * if not consumed, zero the contents to avoid possible stack leaks.
> + */
> +struct msi_msg msg = {};

What the comment says is true only when pi_desc != NULL. As can be seen in
context above, it can very well be NULL here, though (which isn't to say
that I'm convinced the NULL case is handled correctly here). I'd view it as
more safe anyway if you set msg from msi_desc->msg.

Jan

Re: [PATCH v2] xen/page_alloc: Simplify domain_adjust_tot_pages

2025-03-11 Thread Roger Pau Monné

On Tue, Mar 11, 2025 at 02:53:04PM +, Alejandro Vallejo wrote:
> On Tue Mar 11, 2025 at 12:46 PM GMT, Roger Pau Monné wrote:
> > On Tue, Mar 04, 2025 at 11:10:00AM +, Alejandro Vallejo wrote:
> > > The logic has too many levels of indirection and it's very hard to
> > > understand it its current form. Split it between the corner case where
> > > the adjustment is bigger than the current claim and the rest to avoid 5
> > > auxiliary variables.
> > > 
> > > Add a functional change to prevent negative adjustments from
> > > re-increasing the claim. This has the nice side effect of avoiding
> > > taking the heap lock here on every free.
> > > 
> > > While at it, fix incorrect field name in nearby comment.
> > > 
> > > Signed-off-by: Alejandro Vallejo 
> >
> > Acked-by: Roger Pau Monné 
> 
> Thanks.
> 
> > I think it would be nice to also ensure that once the domain is
> > finished building (maybe when it's unpaused for the first
> > time?) d->outstanding_pages is set to 0.  IMO the claim system was
> > designed to avoid races during domain building, and shouldn't be used
> > once the domain is already running.
> >
> > Thanks, Roger.
> 
> As a matter of implementation that's already the case by toolstack being 
> "nice"
> and unconditionally clearing claims after populating the physmap.

I see.  Another option would be to refuse the unpause a domain if it
still has pending claims.  However I don't know how that will work out
with all possible toolstacks.

> However, I agree the hypervisor should do it on its own. I didn't find a
> suitable place for it. 

You could do it in arch_domain_creation_finished().

> It may be possible to do it on every unpause with
> minimal overhead because it doesn't need to take the heap lock unless there 
> are
> outstanding claims on the domains. Would that sound like an ok idea? I'd 
> rather
> not add even more state into "struct domain" to count pauses...

There's another side missing IMO, XENMEM_claim_pages should return an
error (and refuse the set any claims) once d->creation_finished ==
true.

Thanks, Roger.

[PATCH v1] xen/riscv: add H extenstion to -march

2025-03-11 Thread Oleksii Kurochko

H provides additional instructions and CSRs that control the new stage of
address translation and support hosting a guest OS in virtual S-mode
(VS-mode).

According to the Unprivileged Architecture (version 20240411) specification:
```
Table 74 summarizes the standardized extension names. The table also defines
the canonical order in which extension names must appear in the name string,
with top-to-bottom in table indicating first-to-last in the name string, e.g.,
RV32IMACV is legal, whereas RV32IMAVC is not.
```
According to Table 74, the h extension is placed last in the one-letter
extensions name part of the ISA string.

`h` is a standalone extension based on the patch [1] but it wasn't so
before.
As the minimal supported GCC version to build Xen for RISC-V is 12.2.0,
and for that version, h is still considered a prefix for the hypervisor
extension but the name of hypervisor extension must be more then 1 letter
extension, a workaround ( with using `hh` as an H extension name ) is
implemented as otherwise the following compilation error will occur:
 error: '-march=rv64gc_h_zbb_zihintpause': name of hypervisor extension
must be more than 1 letter

After GCC version 13.1.0, the commit [1] introducing H extension support
allows us to drop the workaround with `hh` as hypervisor extension name
and use only one h in -march.

To implement this, the h-extension-name is introduced, which is filled with
hh or h depending on the GCC version.

[1] 
https://github.com/gcc-mirror/gcc/commit/0cd11d301013af50a3fae0694c909952e94e20d5#diff-d6f7db0db31bfb339b01bec450f1b905381eb4730cc5ab2b2794971e34647d64R148

Signed-off-by: Oleksii Kurochko 
---
 docs/misc/riscv/booting.txt | 4 
 xen/arch/riscv/arch.mk  | 3 ++-
 xen/arch/riscv/cpufeature.c | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/misc/riscv/booting.txt b/docs/misc/riscv/booting.txt
index cb4d79f12c..3a8474a27d 100644
--- a/docs/misc/riscv/booting.txt
+++ b/docs/misc/riscv/booting.txt
@@ -3,6 +3,10 @@ System requirements
 
 The following extensions are expected to be supported by a system on which
 Xen is run:
+- H:
+  Provides additional instructions and CSRs that control the new stage of
+  address translation and support hosting a guest OS in virtual S-mode
+  (VS-mode).
 - Zbb:
   RISC-V doesn't have a CLZ instruction in the base ISA.
   As a consequence, __builtin_ffs() emits a library call to ffs() on GCC,
diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk
index 236ea7c8a6..14d1f0ada0 100644
--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -9,7 +9,8 @@ riscv-abi-$(CONFIG_RISCV_64) := -mabi=lp64
 riscv-march-$(CONFIG_RISCV_64) := rv64
 riscv-march-y += ima
 riscv-march-$(CONFIG_RISCV_ISA_C) += c
-riscv-march-y += _zicsr_zifencei_zbb
+h-extension-name := $(call cc-ifversion,-lt,1301, hh, h)
+riscv-march-y += $(h-extension-name)_zicsr_zifencei_zbb
 
 riscv-generic-flags := $(riscv-abi-y) -march=$(subst 
$(space),,$(riscv-march-y))
 
diff --git a/xen/arch/riscv/cpufeature.c b/xen/arch/riscv/cpufeature.c
index bf09aa1170..5aafab0f49 100644
--- a/xen/arch/riscv/cpufeature.c
+++ b/xen/arch/riscv/cpufeature.c
@@ -146,6 +146,7 @@ static const struct riscv_isa_ext_data __initconst 
required_extensions[] = {
 #ifdef CONFIG_RISCV_ISA_C
 RISCV_ISA_EXT_DATA(c),
 #endif
+RISCV_ISA_EXT_DATA(h),
 RISCV_ISA_EXT_DATA(zicsr),
 RISCV_ISA_EXT_DATA(zifencei),
 RISCV_ISA_EXT_DATA(zihintpause),
-- 
2.48.1

Re: [PATCH v5 1/3] x86/vmx: fix posted interrupts usage of msi_desc->msg field

2025-03-11 Thread Jan Beulich

On 11.03.2025 16:27, Roger Pau Monne wrote:
> The current usage of msi_desc->msg in vmx_pi_update_irte() will make the
> field contain a translated MSI message, instead of the expected
> untranslated one.  This breaks dump_msi(), that use the data in
> msi_desc->msg to print the interrupt details.
> 
> Fix this by introducing a dummy local msi_msg, and use it with
> iommu_update_ire_from_msi().  vmx_pi_update_irte() relies on the MSI
> message not changing, so there's no need to propagate the resulting msi_msg
> to the hardware, and the contents can be ignored.
> 
> Additionally add a comment to clarify that msi_desc->msg must always
> contain the untranslated MSI message.
> 
> Fixes: a5e25908d18d ('VT-d: introduce new fields in msi_desc to track binding 
> with guest interrupt')
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich

Re: [RFC PATCH v3 7/7] xen/arm: scmi: generate scmi dt node for DomUs

2025-03-11 Thread Jan Beulich

On 11.03.2025 12:16, Grygorii Strashko wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1223,6 +1223,13 @@ struct xen_domctl_vmtrace_op {
>  #define XEN_DOMCTL_vmtrace_get_option 5
>  #define XEN_DOMCTL_vmtrace_set_option 6
>  };
> +
> +/* XEN_DOMCTL_get_sci_info */
> +struct xen_domctl_sci_info {
> +uint64_t paddr;
> +uint32_t func_id;
> +};

Please take a look at the rest of this header: Outside of x86-specific
sub-ops there's no use of uint64_t; uint64_aligned_t wants using instead.

> @@ -1333,6 +1340,9 @@ struct xen_domctl {
>  #define XEN_DOMCTL_dt_overlay87
>  #define XEN_DOMCTL_gsi_permission88
>  #define XEN_DOMCTL_set_llc_colors89
> +
> +#define XEN_DOMCTL_get_sci_info  90
> +
>  #define XEN_DOMCTL_gdbsx_guestmemio1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu 1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu   1002

The latter of the blank lines may make sense to add. There former shouldn't
be there imo.

Jan

Re: [PATCH 1/2] xen/arm: Improve handling of nr_spis

2025-03-11 Thread Bertrand Marquis




> On 11 Mar 2025, at 14:33, Orzel, Michal  wrote:
> 
> 
> 
> On 11/03/2025 14:26, Bertrand Marquis wrote:
>> 
>> 
>> Hi Michal,
>> 
>>> On 11 Mar 2025, at 12:06, Orzel, Michal  wrote:
>>> 
>>> 
>>> 
>>> On 11/03/2025 11:12, Bertrand Marquis wrote:
 
 
> On 11 Mar 2025, at 10:59, Orzel, Michal  wrote:
> 
> 
> 
> On 11/03/2025 10:30, Bertrand Marquis wrote:
>> 
>> 
>> Hi Michal,
>> 
>>> On 11 Mar 2025, at 10:04, Michal Orzel  wrote:
>>> 
>>> At the moment, we print a warning about max number of IRQs supported by
>>> GIC bigger than vGIC only for hardware domain. This check is not hwdom
>>> special, and should be made common. Also, in case of user not specifying
>>> nr_spis for dom0less domUs, we should take into account max number of
>>> IRQs supported by vGIC if it's smaller than for GIC.
>>> 
>>> Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
>>> Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
>>> IRQs comparison common.
>>> 
>>> Signed-off-by: Michal Orzel 
>>> ---
>>> xen/arch/arm/dom0less-build.c   | 2 +-
>>> xen/arch/arm/domain_build.c | 9 ++---
>>> xen/arch/arm/gic.c  | 3 +++
>>> xen/arch/arm/include/asm/vgic.h | 3 +++
>>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/xen/arch/arm/dom0less-build.c 
>>> b/xen/arch/arm/dom0less-build.c
>>> index 31f31c38da3f..9a84fee94119 100644
>>> --- a/xen/arch/arm/dom0less-build.c
>>> +++ b/xen/arch/arm/dom0less-build.c
>>> @@ -1018,7 +1018,7 @@ void __init create_domUs(void)
>>>  {
>>>  int vpl011_virq = GUEST_VPL011_SPI;
>>> 
>>> -d_cfg.arch.nr_spis = gic_number_lines() - 32;
>>> +d_cfg.arch.nr_spis = min(gic_number_lines(), 
>>> VGIC_MAX_IRQS) - 32;
>> 
>> I would suggest to introduce a static inline gic_nr_spis in a gic header 
>> ...
> Why GIC and not vGIC? This is domain's nr_spis, so vGIC.
 
 yes vGIC sorry.
 
> But then, why static inline if the value does not change and is domain 
> agnostic?
> I struggle to find a good name for this macro. Maybe (in vgic.h):
> #define vgic_def_nr_spis (min(gic_number_lines(), VGIC_MAX_IRQS) - 32)
> to denote default nr_spis if not set by the user?
 
 Yes that would work. My point is to prevent to have 2 definitions in 2 
 different
 source file and a risk to forget to update one and not the other (let say 
 if some
 day we change 32 in 64).
 
> 
>> 
>>> 
>>>  /*
>>>   * The VPL011 virq is GUEST_VPL011_SPI, unless direct-map is
>>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>>> index 7cc141ef75e9..b99c4e3a69bf 100644
>>> --- a/xen/arch/arm/domain_build.c
>>> +++ b/xen/arch/arm/domain_build.c
>>> @@ -2371,13 +2371,8 @@ void __init create_dom0(void)
>>> 
>>>  /* The vGIC for DOM0 is exactly emulating the hardware GIC */
>>>  dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
>>> -/*
>>> - * Xen vGIC supports a maximum of 992 interrupt lines.
>>> - * 32 are substracted to cover local IRQs.
>>> - */
>>> -dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 
>>> 992) - 32;
>>> -if ( gic_number_lines() > 992 )
>>> -printk(XENLOG_WARNING "Maximum number of vGIC IRQs 
>>> exceeded.\n");
>>> +/* 32 are substracted to cover local IRQs */
>>> +dom0_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 
>>> 32;
>> 
>> and reuse it here to make sure the value used is always the same.
>> 
>>>  dom0_cfg.arch.tee_type = tee_get_type();
>>>  dom0_cfg.max_vcpus = dom0_max_vcpus();
>>> 
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index acf61a4de373..e80fe0ca2421 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -251,6 +251,9 @@ void __init gic_init(void)
>>>  panic("Failed to initialize the GIC drivers\n");
>>>  /* Clear LR mask for cpu0 */
>>>  clear_cpu_lr_mask();
>>> +
>>> +if ( gic_number_lines() > VGIC_MAX_IRQS )
>>> +printk(XENLOG_WARNING "Maximum number of vGIC IRQs 
>>> exceeded\n");
>> 
>> I am a bit unsure with this one.
>> If this is the case, it means any gicv2 or gicv3 init detected an 
>> impossible value and
>> any usage of gic_number_lines would be using an impossible value.
> Why impossible? GIC can support up to 1020 IRQs. Our vGIC can support up 
> to 992
> IRQs.
 
 Maybe unsupported is a better wording, my point is that it could lead to 
 non working system
 if say something uses irq 1000.
>>> Actually, I took a look at the code and I don't think we should panic (i.e. 
>>>

Re: [PATCH 1/2] xen/arm: Improve handling of nr_spis

2025-03-11 Thread Orzel, Michal




On 11/03/2025 14:26, Bertrand Marquis wrote:
> 
> 
> Hi Michal,
> 
>> On 11 Mar 2025, at 12:06, Orzel, Michal  wrote:
>>
>>
>>
>> On 11/03/2025 11:12, Bertrand Marquis wrote:
>>>
>>>
 On 11 Mar 2025, at 10:59, Orzel, Michal  wrote:



 On 11/03/2025 10:30, Bertrand Marquis wrote:
>
>
> Hi Michal,
>
>> On 11 Mar 2025, at 10:04, Michal Orzel  wrote:
>>
>> At the moment, we print a warning about max number of IRQs supported by
>> GIC bigger than vGIC only for hardware domain. This check is not hwdom
>> special, and should be made common. Also, in case of user not specifying
>> nr_spis for dom0less domUs, we should take into account max number of
>> IRQs supported by vGIC if it's smaller than for GIC.
>>
>> Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
>> Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
>> IRQs comparison common.
>>
>> Signed-off-by: Michal Orzel 
>> ---
>> xen/arch/arm/dom0less-build.c   | 2 +-
>> xen/arch/arm/domain_build.c | 9 ++---
>> xen/arch/arm/gic.c  | 3 +++
>> xen/arch/arm/include/asm/vgic.h | 3 +++
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/arch/arm/dom0less-build.c 
>> b/xen/arch/arm/dom0less-build.c
>> index 31f31c38da3f..9a84fee94119 100644
>> --- a/xen/arch/arm/dom0less-build.c
>> +++ b/xen/arch/arm/dom0less-build.c
>> @@ -1018,7 +1018,7 @@ void __init create_domUs(void)
>>   {
>>   int vpl011_virq = GUEST_VPL011_SPI;
>>
>> -d_cfg.arch.nr_spis = gic_number_lines() - 32;
>> +d_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) 
>> - 32;
>
> I would suggest to introduce a static inline gic_nr_spis in a gic header 
> ...
 Why GIC and not vGIC? This is domain's nr_spis, so vGIC.
>>>
>>> yes vGIC sorry.
>>>
 But then, why static inline if the value does not change and is domain 
 agnostic?
 I struggle to find a good name for this macro. Maybe (in vgic.h):
 #define vgic_def_nr_spis (min(gic_number_lines(), VGIC_MAX_IRQS) - 32)
 to denote default nr_spis if not set by the user?
>>>
>>> Yes that would work. My point is to prevent to have 2 definitions in 2 
>>> different
>>> source file and a risk to forget to update one and not the other (let say 
>>> if some
>>> day we change 32 in 64).
>>>

>
>>
>>   /*
>>* The VPL011 virq is GUEST_VPL011_SPI, unless direct-map is
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 7cc141ef75e9..b99c4e3a69bf 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -2371,13 +2371,8 @@ void __init create_dom0(void)
>>
>>   /* The vGIC for DOM0 is exactly emulating the hardware GIC */
>>   dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
>> -/*
>> - * Xen vGIC supports a maximum of 992 interrupt lines.
>> - * 32 are substracted to cover local IRQs.
>> - */
>> -dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) 
>> - 32;
>> -if ( gic_number_lines() > 992 )
>> -printk(XENLOG_WARNING "Maximum number of vGIC IRQs 
>> exceeded.\n");
>> +/* 32 are substracted to cover local IRQs */
>> +dom0_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;
>
> and reuse it here to make sure the value used is always the same.
>
>>   dom0_cfg.arch.tee_type = tee_get_type();
>>   dom0_cfg.max_vcpus = dom0_max_vcpus();
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index acf61a4de373..e80fe0ca2421 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -251,6 +251,9 @@ void __init gic_init(void)
>>   panic("Failed to initialize the GIC drivers\n");
>>   /* Clear LR mask for cpu0 */
>>   clear_cpu_lr_mask();
>> +
>> +if ( gic_number_lines() > VGIC_MAX_IRQS )
>> +printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded\n");
>
> I am a bit unsure with this one.
> If this is the case, it means any gicv2 or gicv3 init detected an 
> impossible value and
> any usage of gic_number_lines would be using an impossible value.
 Why impossible? GIC can support up to 1020 IRQs. Our vGIC can support up 
 to 992
 IRQs.
>>>
>>> Maybe unsupported is a better wording, my point is that it could lead to 
>>> non working system
>>> if say something uses irq 1000.
>> Actually, I took a look at the code and I don't think we should panic (i.e. 
>> we
>> should keep things as they are today). In your example, if something uses 
>> IRQ >
>> VGIC_MAX_IRQS that is bigger than gic_number_lines(), then we will receive 
>> error
>> when mapping this IRQ to guest (but you don

Re: [PATCH v2] xen/page_alloc: Simplify domain_adjust_tot_pages

2025-03-11 Thread Alejandro Vallejo

On Tue Mar 11, 2025 at 12:46 PM GMT, Roger Pau Monné wrote:
> On Tue, Mar 04, 2025 at 11:10:00AM +, Alejandro Vallejo wrote:
> > The logic has too many levels of indirection and it's very hard to
> > understand it its current form. Split it between the corner case where
> > the adjustment is bigger than the current claim and the rest to avoid 5
> > auxiliary variables.
> > 
> > Add a functional change to prevent negative adjustments from
> > re-increasing the claim. This has the nice side effect of avoiding
> > taking the heap lock here on every free.
> > 
> > While at it, fix incorrect field name in nearby comment.
> > 
> > Signed-off-by: Alejandro Vallejo 
>
> Acked-by: Roger Pau Monné 

Thanks.

> I think it would be nice to also ensure that once the domain is
> finished building (maybe when it's unpaused for the first
> time?) d->outstanding_pages is set to 0.  IMO the claim system was
> designed to avoid races during domain building, and shouldn't be used
> once the domain is already running.
>
> Thanks, Roger.

As a matter of implementation that's already the case by toolstack being "nice"
and unconditionally clearing claims after populating the physmap.

However, I agree the hypervisor should do it on its own. I didn't find a
suitable place for it. It may be possible to do it on every unpause with
minimal overhead because it doesn't need to take the heap lock unless there are
outstanding claims on the domains. Would that sound like an ok idea? I'd rather
not add even more state into "struct domain" to count pauses...

Cheers,
Alejandro

[PATCH v4 21/25] drm/virtio: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-03-11 Thread Thomas Zimmermann

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. Align the pitch to a multiple of 4.

Signed-off-by: Thomas Zimmermann 
Cc: David Airlie 
Cc: Gerd Hoffmann 
Cc: Gurchetan Singh 
Cc: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_gem.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
b/drivers/gpu/drm/virtio/virtgpu_gem.c
index dde8fc1a3689..5e5e38d53990 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -23,6 +23,7 @@
  * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 
@@ -66,15 +67,14 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv,
struct virtio_gpu_object_params params = { 0 };
struct virtio_gpu_device *vgdev = dev->dev_private;
int ret;
-   uint32_t pitch;
+
+   ret = drm_mode_size_dumb(dev, args, SZ_4, 0);
+   if (ret)
+   return ret;
 
if (args->bpp != 32)
return -EINVAL;
 
-   pitch = args->width * 4;
-   args->size = pitch * args->height;
-   args->size = ALIGN(args->size, PAGE_SIZE);
-
params.format = virtio_gpu_translate_format(DRM_FORMAT_HOST_XRGB);
params.width = args->width;
params.height = args->height;
@@ -92,7 +92,6 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv,
if (ret)
goto fail;
 
-   args->pitch = pitch;
return ret;
 
 fail:
-- 
2.48.1

[PATCH v4 01/25] drm/dumb-buffers: Sanitize output on errors

2025-03-11 Thread Thomas Zimmermann

The ioctls MODE_CREATE_DUMB and MODE_MAP_DUMB return results into a
memory buffer supplied by user space. On errors, it is possible that
intermediate values are being returned. The exact semantics depends
on the DRM driver's implementation of these ioctls. Although this is
most-likely not a security problem in practice, avoid any uncertainty
by clearing the memory to 0 on errors.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_dumb_buffers.c | 40 ++
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_dumb_buffers.c 
b/drivers/gpu/drm/drm_dumb_buffers.c
index 70032bba1c97..9916aaf5b3f2 100644
--- a/drivers/gpu/drm/drm_dumb_buffers.c
+++ b/drivers/gpu/drm/drm_dumb_buffers.c
@@ -99,7 +99,30 @@ int drm_mode_create_dumb(struct drm_device *dev,
 int drm_mode_create_dumb_ioctl(struct drm_device *dev,
   void *data, struct drm_file *file_priv)
 {
-   return drm_mode_create_dumb(dev, data, file_priv);
+   struct drm_mode_create_dumb *args = data;
+   int err;
+
+   err = drm_mode_create_dumb(dev, args, file_priv);
+   if (err) {
+   args->handle = 0;
+   args->pitch = 0;
+   args->size = 0;
+   }
+   return err;
+}
+
+static int drm_mode_mmap_dumb(struct drm_device *dev, struct drm_mode_map_dumb 
*args,
+ struct drm_file *file_priv)
+{
+   if (!dev->driver->dumb_create)
+   return -ENOSYS;
+
+   if (dev->driver->dumb_map_offset)
+   return dev->driver->dumb_map_offset(file_priv, dev, 
args->handle,
+   &args->offset);
+   else
+   return drm_gem_dumb_map_offset(file_priv, dev, args->handle,
+  &args->offset);
 }
 
 /**
@@ -120,17 +143,12 @@ int drm_mode_mmap_dumb_ioctl(struct drm_device *dev,
 void *data, struct drm_file *file_priv)
 {
struct drm_mode_map_dumb *args = data;
+   int err;
 
-   if (!dev->driver->dumb_create)
-   return -ENOSYS;
-
-   if (dev->driver->dumb_map_offset)
-   return dev->driver->dumb_map_offset(file_priv, dev,
-   args->handle,
-   &args->offset);
-   else
-   return drm_gem_dumb_map_offset(file_priv, dev, args->handle,
-  &args->offset);
+   err = drm_mode_mmap_dumb(dev, args, file_priv);
+   if (err)
+   args->offset = 0;
+   return err;
 }
 
 int drm_mode_destroy_dumb(struct drm_device *dev, u32 handle,
-- 
2.48.1

[PATCH v2 14/16] include/exec/memory: extract devend_big_endian from devend_memop

2025-03-11 Thread Pierrick Bouvier

we'll use it in system/memory.c.

Signed-off-by: Pierrick Bouvier 
---
 include/exec/memory.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 60c0fb6ccd4..57661283684 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,16 +3138,22 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
 MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
   uint8_t c, hwaddr len, MemTxAttrs attrs);
 
-/* enum device_endian to MemOp.  */
-static inline MemOp devend_memop(enum device_endian end)
+/* returns true if end is big endian. */
+static inline bool devend_big_endian(enum device_endian end)
 {
 QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
   DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
 
-bool big_endian = (end == DEVICE_NATIVE_ENDIAN
-   ? target_words_bigendian()
-   : end == DEVICE_BIG_ENDIAN);
-return big_endian ? MO_BE : MO_LE;
+if (end == DEVICE_NATIVE_ENDIAN) {
+return target_words_bigendian();
+}
+return end == DEVICE_BIG_ENDIAN;
+}
+
+/* enum device_endian to MemOp.  */
+static inline MemOp devend_memop(enum device_endian end)
+{
+return devend_big_endian(end) ? MO_BE : MO_LE;
 }
 
 /*
-- 
2.39.5

Re: [PATCH v2] docs: specify numerical values of Xenstore commands

2025-03-11 Thread Jürgen Groß


On 11.03.25 10:21, Julien Grall wrote:

Hi Juergen,

On 11/03/2025 07:31, Juergen Gross wrote:

In docs/misc/xenstore.txt all Xenstore commands are specified, but
the specifications lack the numerical values of the commands.

Add a table with all commands, their values, and a potential remark
(e.g. whether the command is optional).

Reported-by: Jan Beulich 
Signed-off-by: Juergen Gross 
---
V2:
- replace "ŕ" with plain "r" (Jan Beulich)
- replace hard tabs with blanks (Jan Beulich)
- allow GET_FEATURES and GET_QUOTA support without SET_* (Jan Beulich)
---
  docs/misc/xenstore.txt | 57 ++
  1 file changed, 57 insertions(+)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 7e1f031520..8b4b790e11 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -86,6 +86,63 @@ parts of xenstore inaccessible to some clients.  In any 
case passing

  bulk data through xenstore is not recommended as the performance
  properties are poor.
+-- Defined Xenstore message types --
+
+Below is a table with all defined Xenstore message types (type name
+and its associated numerical value).
+
+Some types are optional to be supported by a specific Xenstore
+implementation.  If an optional type is not supported by a Xenstore
+implementation, Xen tools will continue to work, maybe with slightly
+reduced functionality.  A mandatory type not being supported will
+result in severely reduced functionality, like inability to create
+domains.  In case a type is optional, this is stated in the table with
+the lost functionality in case Xenstore doesn't support that type.
+
+XS_CONTROL   0    optional
+    If not supported, xenstore-control command will not work.


Are we documenting anywhere how a user could figure out if the command is not 
supported? Is it a specific error code?


I can add that not supported commands will return "ENOSYS" as an error
response.




+    XS_DEBUG is a deprecated alias of XS_CONTROL.

 > +XS_DIRECTORY 1> +XS_READ  2

+XS_GET_PERMS 3
+XS_WATCH 4
+XS_UNWATCH   5
+XS_TRANSACTION_START 6
+XS_TRANSACTION_END   7
+XS_INTRODUCE 8
+XS_RELEASE   9
+XS_GET_DOMAIN_PATH  10
+XS_WRITE    11
+XS_MKDIR    12
+XS_RM   13
+XS_SET_PERMS    14
+XS_WATCH_EVENT  15
+    Not valid in client sent messages.
+    Only valid in Xenstore replies.
+XS_ERROR    16
+    Not valid in client sent messages.
+    Only valid in Xenstore replies.
+XS_IS_DOMAIN_INTRODUCED 17
+XS_RESUME   18
+XS_SET_TARGET   19
+XS_RESTRICT 20    no longer supported
+    XS_RESTRICT has been removed, the type value 20 is invalid.
+XS_RESET_WATCHES    21
+XS_DIRECTORY_PART   22    optional
+    If not supported, the output of xenstore-ls might be incomplete
+    with more than ca. 1000 domains active.


Why are we making this specific to number of domains? Wouldn't the problem be 
the same if you have 1000 node within a parent node?


Let me reiterate the answer I gave to Jan when he asked:

  And elsewhere there can't be very many nodes resulting in a similar
  situation?

Not without someone trying to force this by will (only possible by a
root user of dom0, e.g. via creating lots of nodes in the same directory
via "xenstore-write").

Having lots of domains is an "easy" way to create this scenario via
perfectly valid and sensible operations.


I can rephrase the remark like this:

If not supported, the output of xenstore-ls might be incomplete
with a node's sub-node list exceeding the maximum payload size
(e.g. the "/local/domain" node with more than ca. 1000 domains
active).


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 3/3] xen: mem_access: conditionally compile vm_event.c & monitor.c

2025-03-11 Thread Jan Beulich

On 11.03.2025 11:27, Sergiy Kibrik wrote:
> From: Stefano Stabellini 
> 
> Extend coverage of CONFIG_VM_EVENT option and make the build of VM events
> and monitoring support optional. Also make MEM_PAGING option depend on 
> VM_EVENT
> to document that mem_paging is relying on vm_event.
> This is to reduce code size on Arm when this option isn't enabled.
> 
> CC: Jan Beulich 
> CC: Tamas K Lengyel 
> Reviewed-by: Ayan Kumar Halder 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Sergiy Kibrik 

Please can tags be kept in chronological order? It's impossible to review
a patch that wasn't first signed-off on by the author(s).

> ---
> changes in v3:
>  - add dependency MEM_PAGING -> VM_EVENT
>  - monitor_domctl() stub returns -EOPNOTSUPP

Seeing this, i.e. ...

> --- a/xen/include/xen/monitor.h
> +++ b/xen/include/xen/monitor.h
> @@ -27,8 +27,17 @@
>  struct domain;
>  struct xen_domctl_monitor_op;
>  
> +#ifdef CONFIG_VM_EVENT
>  int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop);
>  void monitor_guest_request(void);
> +#else
> +static inline int monitor_domctl(struct domain *d,
> + struct xen_domctl_monitor_op *mop)
> +{
> +return -EOPNOTSUPP;

... this, why ...

> @@ -88,7 +85,18 @@ void vm_event_cancel_slot(struct domain *d, struct 
> vm_event_domain *ved);
>  void vm_event_put_request(struct domain *d, struct vm_event_domain *ved,
>vm_event_request_t *req);
>  
> +#ifdef CONFIG_VM_EVENT
> +/* Clean up on domain destruction */
> +void vm_event_cleanup(struct domain *d);
>  int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec);
> +#else
> +static inline void vm_event_cleanup(struct domain *d) {}
> +static inline int vm_event_domctl(struct domain *d,
> +  struct xen_domctl_vm_event_op *vec)
> +{
> +return -EINVAL;

... is it EINVAL here?

Jan

Re: [PATCH v2] docs: specify numerical values of Xenstore commands

2025-03-11 Thread Julien Grall


Hi Juergen,

On 11/03/2025 07:31, Juergen Gross wrote:

In docs/misc/xenstore.txt all Xenstore commands are specified, but
the specifications lack the numerical values of the commands.

Add a table with all commands, their values, and a potential remark
(e.g. whether the command is optional).

Reported-by: Jan Beulich 
Signed-off-by: Juergen Gross 
---
V2:
- replace "ŕ" with plain "r" (Jan Beulich)
- replace hard tabs with blanks (Jan Beulich)
- allow GET_FEATURES and GET_QUOTA support without SET_* (Jan Beulich)
---
  docs/misc/xenstore.txt | 57 ++
  1 file changed, 57 insertions(+)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 7e1f031520..8b4b790e11 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -86,6 +86,63 @@ parts of xenstore inaccessible to some clients.  In any case 
passing
  bulk data through xenstore is not recommended as the performance
  properties are poor.
  
+-- Defined Xenstore message types --

+
+Below is a table with all defined Xenstore message types (type name
+and its associated numerical value).
+
+Some types are optional to be supported by a specific Xenstore
+implementation.  If an optional type is not supported by a Xenstore
+implementation, Xen tools will continue to work, maybe with slightly
+reduced functionality.  A mandatory type not being supported will
+result in severely reduced functionality, like inability to create
+domains.  In case a type is optional, this is stated in the table with
+the lost functionality in case Xenstore doesn't support that type.
+
+XS_CONTROL   0optional
+If not supported, xenstore-control command will not work.


Are we documenting anywhere how a user could figure out if the command 
is not supported? Is it a specific error code?



+XS_DEBUG is a deprecated alias of XS_CONTROL.

> +XS_DIRECTORY 1> +XS_READ  2

+XS_GET_PERMS 3
+XS_WATCH 4
+XS_UNWATCH   5
+XS_TRANSACTION_START 6
+XS_TRANSACTION_END   7
+XS_INTRODUCE 8
+XS_RELEASE   9
+XS_GET_DOMAIN_PATH  10
+XS_WRITE11
+XS_MKDIR12
+XS_RM   13
+XS_SET_PERMS14
+XS_WATCH_EVENT  15
+Not valid in client sent messages.
+Only valid in Xenstore replies.
+XS_ERROR16
+Not valid in client sent messages.
+Only valid in Xenstore replies.
+XS_IS_DOMAIN_INTRODUCED 17
+XS_RESUME   18
+XS_SET_TARGET   19
+XS_RESTRICT 20no longer supported
+XS_RESTRICT has been removed, the type value 20 is invalid.
+XS_RESET_WATCHES21
+XS_DIRECTORY_PART   22optional
+If not supported, the output of xenstore-ls might be incomplete
+with more than ca. 1000 domains active.


Why are we making this specific to number of domains? Wouldn't the 
problem be the same if you have 1000 node within a parent node?



+XS_GET_FEATURE  23optional
+XS_SET_FEATURE  24optional
+XS_SET_FEATURE requires XS_GET_FEATURE to be supported.
+If unsupported, setting availability of Xenstore features per
+domain is not possible.
+XS_GET_QUOTA25optional
+XS_SET_QUOTA26optional
+XS_SET_QUOTA requires XS_GET_QUOTA to be supported.
+If unsupported, setting of Xenstore quota per domain is not
+possible.
+XS_INVALID   65535
+Guaranteed invalid type (never supported).
  
  -- Xenstore protocol details - introduction --
  


Cheers,

--
Julien Grall

[PATCH v4 3/3] x86/iommu: avoid MSI address and data writes if IRT index hasn't changed

2025-03-11 Thread Roger Pau Monne

Attempt to reduce the MSI entry writes, and the associated checking whether
memory decoding and MSI-X is enabled for the PCI device, when the MSI data
hasn't changed.

When using Interrupt Remapping the MSI entry will contain an index into
the remapping table, and it's in such remapping table where the MSI vector
and destination CPU is stored.  As such, when using interrupt remapping,
changes to the interrupt affinity shouldn't result in changes to the MSI
entry, and the MSI entry update can be avoided.

Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
address fields have changed, and thus need writing to the device registers.
Such signaling is done by returning 1 from the function.  Otherwise
returning 0 means no update of the MSI fields, and thus no write
required.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Ross Lagerwall 
---
Changes since v3:
 - Assert MSI fields are never updated in vmx_pi_update_irte().
 - Directly return booleans from msi_msg_to_remap_entry() and
   update_intremap_entry_from_msi_msg().

Changes since v2:
 - New approach.

Changes since v1:
 - Add more comments.
 - Simplify dma_msi_set_affinity().
---
 xen/arch/x86/hpet.c  |  6 +-
 xen/arch/x86/hvm/vmx/vmx.c   | 13 -
 xen/arch/x86/msi.c   | 11 ++-
 xen/drivers/passthrough/amd/iommu_intr.c |  4 ++--
 xen/drivers/passthrough/vtd/intremap.c   |  4 +++-
 xen/include/xen/iommu.h  |  6 ++
 6 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 51ff7f12f5c0..1bca8c8b670d 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -283,8 +283,12 @@ static int hpet_msi_write(struct hpet_event_channel *ch, 
struct msi_msg *msg)
 {
 int rc = iommu_update_ire_from_msi(&ch->msi, msg);
 
-if ( rc )
+if ( rc < 0 )
 return rc;
+/*
+ * Always propagate writes, to avoid having to pass a flag for handling
+ * a forceful write in the resume from suspension case.
+ */
 }
 
 hpet_write32(msg->data, HPET_Tn_ROUTE(ch->idx));
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c407513af624..7535a45b31fc 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -422,7 +422,18 @@ static int cf_check vmx_pi_update_irte(const struct vcpu 
*v,
 
 ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
 
-return iommu_update_ire_from_msi(msi_desc, &msg);
+rc = iommu_update_ire_from_msi(msi_desc, &msg);
+if ( rc > 0 )
+{
+/*
+ * Callers of vmx_pi_update_irte() won't propagate the updated MSI
+ * fields to the hardware, must assert there are no changes.
+ */
+ASSERT_UNREACHABLE();
+rc = -EILSEQ;
+}
+
+return rc;
 
  unlock_out:
 spin_unlock_irq(&desc->lock);
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 6c11d76015fb..163ccf874720 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -184,7 +184,8 @@ void msi_compose_msg(unsigned vector, const cpumask_t 
*cpu_mask, struct msi_msg
 MSI_DATA_VECTOR(vector);
 }
 
-static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
+static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg,
+ bool force)
 {
 entry->msg = *msg;
 
@@ -194,7 +195,7 @@ static int write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
 
 ASSERT(msg != &entry->msg);
 rc = iommu_update_ire_from_msi(entry, msg);
-if ( rc )
+if ( rc < 0 || (rc == 0 && !force) )
 return rc;
 }
 
@@ -259,7 +260,7 @@ void cf_check set_msi_affinity(struct irq_desc *desc, const 
cpumask_t *mask)
 msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 msg.dest32 = dest;
 
-write_msi_msg(msi_desc, &msg);
+write_msi_msg(msi_desc, &msg, false);
 }
 
 void __msi_set_enable(pci_sbdf_t sbdf, int pos, int enable)
@@ -522,7 +523,7 @@ int __setup_msi_irq(struct irq_desc *desc, struct msi_desc 
*msidesc,
 desc->msi_desc = msidesc;
 desc->handler = handler;
 msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
-ret = write_msi_msg(msidesc, &msg);
+ret = write_msi_msg(msidesc, &msg, false);
 if ( unlikely(ret) )
 {
 desc->handler = &no_irq_type;
@@ -1403,7 +1404,7 @@ int pci_restore_msi_state(struct pci_dev *pdev)
 type = entry->msi_attrib.type;
 
 msg = entry->msg;
-write_msi_msg(entry, &msg);
+write_msi_msg(entry, &msg, true);
 
 for ( i = 0; ; )
 {
diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index c0273059cb1d..9abdc38053d7 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg(
g

Re: [PATCH v3 1/3] xen: kconfig: rename MEM_ACCESS -> VM_EVENT

2025-03-11 Thread Tamas K Lengyel

On Tue, Mar 11, 2025 at 8:01 AM Jan Beulich  wrote:
>
> On 11.03.2025 11:23, Sergiy Kibrik wrote:
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -92,7 +92,7 @@ config HAS_VMAP
> >  config MEM_ACCESS_ALWAYS_ON
> >   bool
> >
> > -config MEM_ACCESS
> > +config VM_EVENT
> >   def_bool MEM_ACCESS_ALWAYS_ON
> >   prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
> >   depends on HVM
>
> I still don't see why we would then stick to MEM_ACCESS_ALWAYS_ON as a name
> for the sibling option.

I'm not opposed to renaming it but it's also not something I see much
value in doing. It's not used anywhere in the code, it's purely used
in the config selector to mark that on x86 the
vm_event/mem_access/monitor bits are not setup to be compiled-out.

Tamas

Re: [PATCH v3 1/3] xen: kconfig: rename MEM_ACCESS -> VM_EVENT

2025-03-11 Thread Sergiy Kibrik


11.03.25 14:01, Jan Beulich:

On 11.03.2025 11:23, Sergiy Kibrik wrote:

--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -92,7 +92,7 @@ config HAS_VMAP
  config MEM_ACCESS_ALWAYS_ON
bool
  
-config MEM_ACCESS

+config VM_EVENT
def_bool MEM_ACCESS_ALWAYS_ON
prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
depends on HVM


I still don't see why we would then stick to MEM_ACCESS_ALWAYS_ON as a name
for the sibling option.



as HVM requires mem-access to be enabled I felt like it should be kept 
as is -- to hint that it is mem-access that ties vm_event to hvm.


 -Sergiy

Re: [PATCH v3 3/3] xen: mem_access: conditionally compile vm_event.c & monitor.c

2025-03-11 Thread Tamas K Lengyel

On Tue, Mar 11, 2025 at 7:59 AM Jan Beulich  wrote:
>
> On 11.03.2025 11:27, Sergiy Kibrik wrote:
> > From: Stefano Stabellini 
> >
> > Extend coverage of CONFIG_VM_EVENT option and make the build of VM events
> > and monitoring support optional. Also make MEM_PAGING option depend on 
> > VM_EVENT
> > to document that mem_paging is relying on vm_event.
> > This is to reduce code size on Arm when this option isn't enabled.
> >
> > CC: Jan Beulich 
> > CC: Tamas K Lengyel 
> > Reviewed-by: Ayan Kumar Halder 
> > Signed-off-by: Stefano Stabellini 
> > Signed-off-by: Sergiy Kibrik 
>
> Please can tags be kept in chronological order? It's impossible to review
> a patch that wasn't first signed-off on by the author(s).
>
> > ---
> > changes in v3:
> >  - add dependency MEM_PAGING -> VM_EVENT

This seems to be largely unnecessary since on x86 selecting HVM
already selects it but I guess it also doesn't hurt to explicitly mark
it like this either.

> >  - monitor_domctl() stub returns -EOPNOTSUPP
>
> Seeing this, i.e. ...
>
> > --- a/xen/include/xen/monitor.h
> > +++ b/xen/include/xen/monitor.h
> > @@ -27,8 +27,17 @@
> >  struct domain;
> >  struct xen_domctl_monitor_op;
> >
> > +#ifdef CONFIG_VM_EVENT
> >  int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop);
> >  void monitor_guest_request(void);
> > +#else
> > +static inline int monitor_domctl(struct domain *d,
> > + struct xen_domctl_monitor_op *mop)
> > +{
> > +return -EOPNOTSUPP;
>
> ... this, why ...
>
> > @@ -88,7 +85,18 @@ void vm_event_cancel_slot(struct domain *d, struct 
> > vm_event_domain *ved);
> >  void vm_event_put_request(struct domain *d, struct vm_event_domain *ved,
> >vm_event_request_t *req);
> >
> > +#ifdef CONFIG_VM_EVENT
> > +/* Clean up on domain destruction */
> > +void vm_event_cleanup(struct domain *d);
> >  int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec);
> > +#else
> > +static inline void vm_event_cleanup(struct domain *d) {}
> > +static inline int vm_event_domctl(struct domain *d,
> > +  struct xen_domctl_vm_event_op *vec)
> > +{
> > +return -EINVAL;
>
> ... is it EINVAL here?

I would prefer if it was consistent too with -EOPNOTSUPP for both when
the subsystems are compiled out.

Thanks,
Tamas

Re: [PATCH v2 1/7] xen/arm: allow PCI host bridge to have private data

2025-03-11 Thread Grygorii Strashko





On 11.03.25 12:24, Mykyta Poturai wrote:

From: Oleksandr Andrushchenko 

Some of the PCI host bridges require private data. Create a generic
approach for that, so such bridges may request the private data to
be allocated during initialization.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* no change
---
  xen/arch/arm/include/asm/pci.h  |  4 +++-
  xen/arch/arm/pci/pci-host-common.c  | 18 +-
  xen/arch/arm/pci/pci-host-generic.c |  2 +-
  xen/arch/arm/pci/pci-host-zynqmp.c  |  2 +-
  4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h
index 7f77226c9b..44344ac8c1 100644
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -66,6 +66,7 @@ struct pci_host_bridge {
  uint16_t segment;/* Segment number */
  struct pci_config_window* cfg;   /* Pointer to the bridge config window */
  const struct pci_ops *ops;
+void *priv;  /* Private data of the bridge. */
  };
  
  struct pci_ops {

@@ -95,7 +96,8 @@ struct pci_ecam_ops {
  extern const struct pci_ecam_ops pci_generic_ecam_ops;
  
  int pci_host_common_probe(struct dt_device_node *dev,

-  const struct pci_ecam_ops *ops);
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz);
  int pci_generic_config_read(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
  uint32_t reg, uint32_t len, uint32_t *value);
  int pci_generic_config_write(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
diff --git a/xen/arch/arm/pci/pci-host-common.c 
b/xen/arch/arm/pci/pci-host-common.c
index c0faf0f436..be7e6c3510 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -209,7 +209,8 @@ static int pci_bus_find_domain_nr(struct dt_device_node 
*dev)
  }
  
  int pci_host_common_probe(struct dt_device_node *dev,

-  const struct pci_ecam_ops *ops)
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz)
  {
  struct pci_host_bridge *bridge;
  struct pci_config_window *cfg;
@@ -241,11 +242,26 @@ int pci_host_common_probe(struct dt_device_node *dev,
  printk(XENLOG_ERR "Inconsistent \"linux,pci-domain\" property in 
DT\n");
  BUG();
  }
+
  bridge->segment = domain;
+
+if ( priv_sz )
+{
+bridge->priv = xzalloc_bytes(priv_sz);
+if ( !bridge->priv )
+{
+err = -ENOMEM;
+goto err_priv;
+}
+}
+


I'd like to propose to move allocation into pci_alloc_host_bridge()
to keep mallocs in one place and do it earlier, before proceeding
with other initialization steps.

Also the pci_alloc_host_bridge() could be made static, seems.


  pci_add_host_bridge(bridge);
  
  return 0;
  
+err_priv:

+xfree(bridge->priv);
+
  err_exit:
  xfree(bridge);
  


[...]

-grygorii

Re: [PATCH v2 03/16] exec/memory_ldst: extract memory_ldst declarations from cpu-all.h

2025-03-11 Thread Pierrick Bouvier


On 3/10/25 21:08, Pierrick Bouvier wrote:

They are now accessible through exec/memory.h instead, and we make sure
all variants are available for common or target dependent code.

Signed-off-by: Pierrick Bouvier 
---
  include/exec/cpu-all.h | 12 
  include/exec/memory_ldst.h.inc |  4 
  2 files changed, 16 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e56c064d46f..0e8205818a4 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -44,18 +44,6 @@
  
  #include "exec/hwaddr.h"
  
-#define SUFFIX

-#define ARG1 as
-#define ARG1_DECLAddressSpace *as
-#define TARGET_ENDIANNESS
-#include "exec/memory_ldst.h.inc"
-
-#define SUFFIX   _cached_slow
-#define ARG1 cache
-#define ARG1_DECLMemoryRegionCache *cache
-#define TARGET_ENDIANNESS
-#include "exec/memory_ldst.h.inc"
-
  static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t 
val)
  {
  address_space_stl_notdirty(as, addr, val,
diff --git a/include/exec/memory_ldst.h.inc b/include/exec/memory_ldst.h.inc
index 92ad74e9560..7270235c600 100644
--- a/include/exec/memory_ldst.h.inc
+++ b/include/exec/memory_ldst.h.inc
@@ -19,7 +19,6 @@
   * License along with this library; if not, see 
.
   */
  
-#ifdef TARGET_ENDIANNESS

  uint16_t glue(address_space_lduw, SUFFIX)(ARG1_DECL,
  hwaddr addr, MemTxAttrs attrs, MemTxResult *result);
  uint32_t glue(address_space_ldl, SUFFIX)(ARG1_DECL,
@@ -34,7 +33,6 @@ void glue(address_space_stl, SUFFIX)(ARG1_DECL,
  hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result);
  void glue(address_space_stq, SUFFIX)(ARG1_DECL,
  hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result);
-#else
  uint8_t glue(address_space_ldub, SUFFIX)(ARG1_DECL,
  hwaddr addr, MemTxAttrs attrs, MemTxResult *result);
  uint16_t glue(address_space_lduw_le, SUFFIX)(ARG1_DECL,
@@ -63,9 +61,7 @@ void glue(address_space_stq_le, SUFFIX)(ARG1_DECL,
  hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result);
  void glue(address_space_stq_be, SUFFIX)(ARG1_DECL,
  hwaddr addr, uint64_t val, MemTxAttrs attrs, MemTxResult *result);
-#endif
  
  #undef ARG1_DECL

  #undef ARG1
  #undef SUFFIX
-#undef TARGET_ENDIANNESS


Just to track last Richard answer,
Posted on v1:

On 3/10/25 17:04, Pierrick Bouvier wrote:
>  From what I understand, non endian versions are simply passing 
DEVICE_NATIVE_ENDIAN as a
> parameter for address_space_ldl_internal, which will thus match the 
target endianness.

>
> So what is the risk for common code to call this?

You're right.  I failed to look at the current implementation
to see that it would already work.

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 08/16] exec/memory-internal: remove dependency on cpu.h

2025-03-11 Thread Pierrick Bouvier


On 3/11/25 00:26, Philippe Mathieu-Daudé wrote:

On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


Missing the "why" justification we couldn't do that before.


---
   include/exec/memory-internal.h | 2 --
   1 file changed, 2 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 100c1237ac2..b729f3b25ad 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -20,8 +20,6 @@
   #ifndef MEMORY_INTERNAL_H
   #define MEMORY_INTERNAL_H
   
-#include "cpu.h"

-
   #ifndef CONFIG_USER_ONLY
   static inline AddressSpaceDispatch *flatview_to_dispatch(FlatView *fv)
   {




No direct dependency, but when a common code will include that (like 
system/memory.c), we can't have a dependency on cpu.h anymore.

I can reorder or squash commits if you prefer.

Re: [PATCH] x86/elf: Improve code generation in elf_core_save_regs()

2025-03-11 Thread Jan Beulich

On 11.03.2025 15:21, Andrew Cooper wrote:
> On 26/02/2025 8:44 am, Jan Beulich wrote:
>> On 26.02.2025 08:44, Jan Beulich wrote:
>>> On 25.02.2025 23:45, Andrew Cooper wrote:
 A CALL with 0 displacement is handled specially, and is why this logic
 functions even with CET Shadow Stacks active.  Nevertheless a rip-relative 
 LEA
 is the more normal way of doing this in 64bit code.

 The retrieval of flags modifies the stack pointer so needs to state a
 dependency on the stack pointer.  Despite it's name, ASM_CALL_CONSTRAINT is
 the way to do this.

 read_sreg() forces the answer through a register, causing code generation 
 of
 the form:

 mov%gs, %eax
 mov%eax, %eax
 mov%rax, 0x140(%rsi)

 Encode the reads directly with a memory operand.  This results in a 16bit
 store instead of an 64bit store, but the backing memory is zeroed.
>>> Raises the question whether we shouldn't change read_sreg(). At least the
>>> emulator uses of it would also benefit from storing straight to memory. And
>>> the remaining uses ought to be optimizable by the compiler, except that I
>>> don't expect we'd be able to express the zero-extending nature when the
>>> destination is a register. Or wait, maybe I can make up something (whether
>>> that's going to be liked is a separate question).
>> Here you go.
>>
>> Jan
>>
>> x86: make read_sreg() "bi-modal"
>>
>> Permit use sites to control whether to store directly to memory; right
>> now both elf_core_save_regs() and the insn emulator's put_fpu()
>> needlessly go through an intermediate GPR. Note that in both cases the
>> apparent loss of zero-extension isn't a problem: The fields written to
>> start out zero-initialized anyway.
>>
>> No change in generated code for the use sites not being touched.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> Whether to make the change to put_fpu() is up for discussion: In my
>> build it increases code size slightly, despite the reduction of number
>> of insns emitted. An alternative (leaving the decision to the compiler)
>> might be to drop the if() and use "=g" as constraint.
>>
>> I was considering to omit the assignment to sel_ on the if() branch,
>> expecting the compiler to then flag uses of the return value (as
>> consuming uninitialized data) when a 2nd argument is passed. However,
>> gcc14 then already flags the "sel_;" at the end of the macro as
>> consuming uninitialized data.
>>
>> --- a/xen/arch/x86/include/asm/regs.h
>> +++ b/xen/arch/x86/include/asm/regs.h
>> @@ -16,10 +16,20 @@
>>  !diff || ((r)->cs != __HYPERVISOR_CS);  
>>   \
>>  })
>>  
>> -#define read_sreg(name) ({   \
>> -unsigned int __sel;  \
>> -asm ( "mov %%" STR(name) ",%0" : "=r" (__sel) ); \
>> -__sel;   \
>> +#define read_sreg(name, dst...) ({   \
>> +unsigned int sel_;   \
>> +BUILD_BUG_ON(count_args(dst) > 1);   \
>> +if ( count_args(dst) )   \
>> +{\
>> +typeof(LASTARG(&sel_, ## dst)) dst_ =\
>> +LASTARG(&sel_, ## dst);  \
>> +asm ( "mov %%" STR(name) ",%0" : "=m" (*dst_) ); \
>> +/* The compiler ought to optimize this out. */   \
>> +sel_ = *dst_;\
>> +}\
>> +else \
>> +asm ( "mov %%" STR(name) ",%0" : "=r" (sel_) );  \
>> +sel_;\
>>  })
> 
> This doesn't fix the register promotion problem.  That can be fixed by
> unsigned long rather than int, as you did for rdmsr. 
> https://godbolt.org/z/K5hKz7KvM

Right, but that's an orthogonal aspect.

> But the fundamental problem is that the sreg instructions with mem16
> encodings are weird.  They don't even follow normal x86 rules for
> operand size.
> 
> By the end of the FRED series (for which this patch was misc cleanup),
> I've almost removed read_sreg(), and was intending to purge it
> completely.

Well, if that's the plan, then ...

>  Even in it's current form, it's not normal C semantics,
> because it looks to take a variable which isn't a variable.
> 
> Clever as this trick is, I feel it's a backwards step in terms of
> legibility, and that plain asm()'s are the lesser evil when it comes to
> mem16 instructions.

... indeed I agree here.

Jan

Re: [PATCH v2 07/16] exec/exec-all: remove dependency on cpu.h

2025-03-11 Thread Pierrick Bouvier


On 3/11/25 00:26, Philippe Mathieu-Daudé wrote:

On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


Missing the "why" justification we couldn't do that before.


---
   include/exec/exec-all.h | 1 -
   1 file changed, 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index dd5c40f2233..19b0eda44a7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -20,7 +20,6 @@
   #ifndef EXEC_ALL_H
   #define EXEC_ALL_H
   
-#include "cpu.h"

   #if defined(CONFIG_USER_ONLY)
   #include "exec/cpu_ldst.h"
   #endif




Previous commit is named:
codebase: prepare to remove cpu.h from exec/exec-all.h
So before those changes, it's not possible.

I can repeat that here, or squash the patches together, as you prefer.

Re: [PATCH v2 14/16] include/exec/memory: extract devend_big_endian from devend_memop

2025-03-11 Thread Pierrick Bouvier


On 3/11/25 00:36, Philippe Mathieu-Daudé wrote:

On 11/3/25 05:08, Pierrick Bouvier wrote:

we'll use it in system/memory.c.


Having part of the commit description separated in its subject is a
bit annoying. But then I'm probably using 20-years too old tools in
my patch workflow.

Only used in system/{memory,physmem}.c, worth move to a local
system/memory-internal.h header? Or even simpler, move
include/exec/memory-internal.h -> exec/memory-internal.h first.



Good point, I'll move them to the existing exec/memory-internal.h in an 
additional commit.



Signed-off-by: Pierrick Bouvier 
---
   include/exec/memory.h | 18 --
   1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 60c0fb6ccd4..57661283684 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,16 +3138,22 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
   MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
 uint8_t c, hwaddr len, MemTxAttrs attrs);
   
-/* enum device_endian to MemOp.  */

-static inline MemOp devend_memop(enum device_endian end)
+/* returns true if end is big endian. */
+static inline bool devend_big_endian(enum device_endian end)
   {
   QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
 DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
   
-bool big_endian = (end == DEVICE_NATIVE_ENDIAN

-   ? target_words_bigendian()
-   : end == DEVICE_BIG_ENDIAN);
-return big_endian ? MO_BE : MO_LE;
+if (end == DEVICE_NATIVE_ENDIAN) {
+return target_words_bigendian();
+}
+return end == DEVICE_BIG_ENDIAN;
+}
+
+/* enum device_endian to MemOp.  */
+static inline MemOp devend_memop(enum device_endian end)
+{
+return devend_big_endian(end) ? MO_BE : MO_LE;
   }
   
   /*

Re: [PATCH v2 11/16] exec/ram_addr: call xen_hvm_modified_memory only if xen is enabled

2025-03-11 Thread Pierrick Bouvier


On 3/11/25 00:29, Philippe Mathieu-Daudé wrote:

On 11/3/25 05:08, Pierrick Bouvier wrote:

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 


I didn't follow this direction because Richard had a preference
on removing unnecessary inlined functions:
https://lore.kernel.org/qemu-devel/9151205a-13d3-401e-b403-f9195cdb1...@linaro.org/



The patch you mention was moving code, which can be arguably different 
from simply editing existing one.
That said, and even though the concern is real, I would appreciate to 
keep this series focused on achieving the goal, and not doing a refactor 
of the involved headers.



---
   include/exec/ram_addr.h | 8 ++--
   1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 7c011fadd11..098fccb5835 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -342,7 +342,9 @@ static inline void 
cpu_physical_memory_set_dirty_range(ram_addr_t start,
   }
   }
   
-xen_hvm_modified_memory(start, length);

+if (xen_enabled()) {
+xen_hvm_modified_memory(start, length);
+}
   }

Re: [PATCH v2 14/16] include/exec/memory: extract devend_big_endian from devend_memop

2025-03-11 Thread Pierrick Bouvier


On 3/11/25 00:36, Philippe Mathieu-Daudé wrote:

On 11/3/25 05:08, Pierrick Bouvier wrote:

we'll use it in system/memory.c.


Having part of the commit description separated in its subject is a
bit annoying. But then I'm probably using 20-years too old tools in
my patch workflow.


Can you please share what would be the message you (or the tool) would 
prefer in this case?


It's just a single line subject (saying "we extract the function") + an 
additional line justifying why it's needed.




Only used in system/{memory,physmem}.c, worth move to a local
system/memory-internal.h header? Or even simpler, move
include/exec/memory-internal.h -> exec/memory-internal.h first.


Signed-off-by: Pierrick Bouvier 
---
   include/exec/memory.h | 18 --
   1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 60c0fb6ccd4..57661283684 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3138,16 +3138,22 @@ address_space_write_cached(MemoryRegionCache *cache, 
hwaddr addr,
   MemTxResult address_space_set(AddressSpace *as, hwaddr addr,
 uint8_t c, hwaddr len, MemTxAttrs attrs);
   
-/* enum device_endian to MemOp.  */

-static inline MemOp devend_memop(enum device_endian end)
+/* returns true if end is big endian. */
+static inline bool devend_big_endian(enum device_endian end)
   {
   QEMU_BUILD_BUG_ON(DEVICE_HOST_ENDIAN != DEVICE_LITTLE_ENDIAN &&
 DEVICE_HOST_ENDIAN != DEVICE_BIG_ENDIAN);
   
-bool big_endian = (end == DEVICE_NATIVE_ENDIAN

-   ? target_words_bigendian()
-   : end == DEVICE_BIG_ENDIAN);
-return big_endian ? MO_BE : MO_LE;
+if (end == DEVICE_NATIVE_ENDIAN) {
+return target_words_bigendian();
+}
+return end == DEVICE_BIG_ENDIAN;
+}
+
+/* enum device_endian to MemOp.  */
+static inline MemOp devend_memop(enum device_endian end)
+{
+return devend_big_endian(end) ? MO_BE : MO_LE;
   }
   
   /*

Re: [PATCH v6 4/4] CHANGELOG.md: Mention stack-protector feature

2025-03-11 Thread Oleksii Kurochko




On 2/17/25 3:49 AM, Volodymyr Babchuk wrote:

Stack protector is meant to be enabled on all architectures, but
currently it is tested (and enabled) only on ARM, so mention it in ARM
section.

Signed-off-by: Volodymyr Babchuk 

---

TODO: If this patch will not make into 4.20 - rework it by mentioning
a correct version.

Changes in v6:

  - Dropped Andrew's R-b tag because there is little chance that this
  series will be included in 4.20, so this patch should be reworked for
  4.21
---
  CHANGELOG.md | 1 +


Acked-by: Oleksii Kurochko 

~ Oleksii


  1 file changed, 1 insertion(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1de1d1eca1..4cac4079f0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -23,6 +23,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
 - Basic handling for SCMI requests over SMC using Shared Memory, by 
allowing
   forwarding the calls to EL3 FW if coming from hwdom.
 - Support for LLC (Last Level Cache) coloring.
+   - Ability to enable stack protector
   - On x86:
 - xl suspend/resume subcommands.
 - `wallclock` command line option to select time source.

[PATCH v4 2/3] x86/hvm: check return code of hvm_pi_update_irte when binding

2025-03-11 Thread Roger Pau Monne

Consume the return code from hvm_pi_update_irte(), and propagate the error
back to the caller if hvm_pi_update_irte() fails.

Fixes: 35a1caf8b6b5 ('pass-through: update IRTE according to guest interrupt 
config changes')
Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - New in this version.
---
 xen/drivers/passthrough/x86/hvm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index f5faff7a499a..47de6953fdf8 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -381,7 +381,15 @@ int pt_irq_create_bind(
 
 /* Use interrupt posting if it is supported. */
 if ( iommu_intpost )
-hvm_pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
+{
+rc = hvm_pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
+
+if ( rc )
+{
+pt_irq_destroy_bind(d, pt_irq_bind);
+return rc;
+}
+}
 
 if ( pt_irq_bind->u.msi.gflags & XEN_DOMCTL_VMSI_X86_UNMASKED )
 {
-- 
2.48.1

Re: [PATCH 11/23] tools/xenstored: Automatically set dom0_domid and priv_domid

2025-03-11 Thread Jürgen Groß


On 10.03.25 15:32, Jason Andryuk wrote:



On 2025-03-08 02:02, Jürgen Groß wrote:

On 06.03.25 23:03, Jason Andryuk wrote:

With split hardware and control domains, each domain should be
privileged with respect to xenstore.  When adding domains to xenstore,
look at their privilege and add them to xenstored as appropriate.
dom0_domid is used for the hardware domain, and priv_domid is used for a
control domain.

Only one of each is allowed for now.

Signed-off-by: Jason Andryuk 
---
  tools/xenstored/domain.c | 16 
  1 file changed, 16 insertions(+)

diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
index 64c8fd0cc3..f2394cd6e9 100644
--- a/tools/xenstored/domain.c
+++ b/tools/xenstored/domain.c
@@ -795,6 +795,20 @@ static struct domain 
*find_or_alloc_existing_domain(unsigned int domid)

  return domain;
  }
+static void domain_set_privileged(struct domain *domain)
+{
+    xc_domaininfo_t dominfo;
+
+    if ( !get_domain_info(domain->domid, &dominfo) )
+    return;
+
+    if ( dominfo.flags & XEN_DOMINF_priv )
+    priv_domid = domain->domid;
+
+    if ( dominfo.flags & XEN_DOMINF_hardware )
+    dom0_domid = domain->domid;
+}


Please no use of libxenctrl. I have worked hard to eliminate the usage
in order to enable a xenstore-stubdom being used across Xen versions
(C Xenstore is relying on stable hypercalls only now).


Right.  Yes, nice work on switching to stable hypercalls.


You need to add the needed flags to the rather new stable domctl
XEN_DOMCTL_get_domain_state and to libxenmanage.


Ok.


+
  static int new_domain(struct domain *domain, int port, bool restore)
  {
  int rc;
@@ -831,6 +845,8 @@ static int new_domain(struct domain *domain, int port, 
bool restore)

  domain->conn->domain = domain;
  domain->conn->id = domain->domid;
+    domain_set_privileged(domain);


The name implies you are changing the domain to be privileged, but this
is done conditionally only.

Maybe name the function domain_apply_privileges()?


I'm thinking domain_apply_capabilities() since they are being referred to as 
capabilities.


But I'll have to revisit this.  To make xenstored "just work" when domid != 0, 
it should auto-detect its domain id, and that has to be done earlier than this.


I think with the 3 remaining patches of my series [1] this should be handled
already.

[1]: https://lists.xen.org/archives/html/xen-devel/2025-02/msg00120.html
 https://lists.xen.org/archives/html/xen-devel/2025-02/msg00121.html
 https://lists.xen.org/archives/html/xen-devel/2025-02/msg00122.html


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

[PATCH v2 1/7] xen/arm: allow PCI host bridge to have private data

2025-03-11 Thread Mykyta Poturai

From: Oleksandr Andrushchenko 

Some of the PCI host bridges require private data. Create a generic
approach for that, so such bridges may request the private data to
be allocated during initialization.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* no change
---
 xen/arch/arm/include/asm/pci.h  |  4 +++-
 xen/arch/arm/pci/pci-host-common.c  | 18 +-
 xen/arch/arm/pci/pci-host-generic.c |  2 +-
 xen/arch/arm/pci/pci-host-zynqmp.c  |  2 +-
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h
index 7f77226c9b..44344ac8c1 100644
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -66,6 +66,7 @@ struct pci_host_bridge {
 uint16_t segment;/* Segment number */
 struct pci_config_window* cfg;   /* Pointer to the bridge config window */
 const struct pci_ops *ops;
+void *priv;  /* Private data of the bridge. */
 };
 
 struct pci_ops {
@@ -95,7 +96,8 @@ struct pci_ecam_ops {
 extern const struct pci_ecam_ops pci_generic_ecam_ops;
 
 int pci_host_common_probe(struct dt_device_node *dev,
-  const struct pci_ecam_ops *ops);
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz);
 int pci_generic_config_read(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
 uint32_t reg, uint32_t len, uint32_t *value);
 int pci_generic_config_write(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
diff --git a/xen/arch/arm/pci/pci-host-common.c 
b/xen/arch/arm/pci/pci-host-common.c
index c0faf0f436..be7e6c3510 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -209,7 +209,8 @@ static int pci_bus_find_domain_nr(struct dt_device_node 
*dev)
 }
 
 int pci_host_common_probe(struct dt_device_node *dev,
-  const struct pci_ecam_ops *ops)
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz)
 {
 struct pci_host_bridge *bridge;
 struct pci_config_window *cfg;
@@ -241,11 +242,26 @@ int pci_host_common_probe(struct dt_device_node *dev,
 printk(XENLOG_ERR "Inconsistent \"linux,pci-domain\" property in 
DT\n");
 BUG();
 }
+
 bridge->segment = domain;
+
+if ( priv_sz )
+{
+bridge->priv = xzalloc_bytes(priv_sz);
+if ( !bridge->priv )
+{
+err = -ENOMEM;
+goto err_priv;
+}
+}
+
 pci_add_host_bridge(bridge);
 
 return 0;
 
+err_priv:
+xfree(bridge->priv);
+
 err_exit:
 xfree(bridge);
 
diff --git a/xen/arch/arm/pci/pci-host-generic.c 
b/xen/arch/arm/pci/pci-host-generic.c
index 46de6e43cc..cc4bc70684 100644
--- a/xen/arch/arm/pci/pci-host-generic.c
+++ b/xen/arch/arm/pci/pci-host-generic.c
@@ -29,7 +29,7 @@ static const struct dt_device_match __initconstrel 
gen_pci_dt_match[] =
 static int __init pci_host_generic_probe(struct dt_device_node *dev,
  const void *data)
 {
-return pci_host_common_probe(dev, &pci_generic_ecam_ops);
+return pci_host_common_probe(dev, &pci_generic_ecam_ops, 0);
 }
 
 DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI_HOSTBRIDGE)
diff --git a/xen/arch/arm/pci/pci-host-zynqmp.c 
b/xen/arch/arm/pci/pci-host-zynqmp.c
index 101edb8593..985a43c516 100644
--- a/xen/arch/arm/pci/pci-host-zynqmp.c
+++ b/xen/arch/arm/pci/pci-host-zynqmp.c
@@ -47,7 +47,7 @@ static const struct dt_device_match __initconstrel 
nwl_pcie_dt_match[] =
 static int __init pci_host_generic_probe(struct dt_device_node *dev,
  const void *data)
 {
-return pci_host_common_probe(dev, &nwl_pcie_ops);
+return pci_host_common_probe(dev, &nwl_pcie_ops, 0);
 }
 
 DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI_HOSTBRIDGE)
-- 
2.34.1

[PATCH v3 3/3] xen: mem_access: conditionally compile vm_event.c & monitor.c

2025-03-11 Thread Sergiy Kibrik

From: Stefano Stabellini 

Extend coverage of CONFIG_VM_EVENT option and make the build of VM events
and monitoring support optional. Also make MEM_PAGING option depend on VM_EVENT
to document that mem_paging is relying on vm_event.
This is to reduce code size on Arm when this option isn't enabled.

CC: Jan Beulich 
CC: Tamas K Lengyel 
Reviewed-by: Ayan Kumar Halder 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Sergiy Kibrik 
---
changes in v3:
 - add dependency MEM_PAGING -> VM_EVENT
 - monitor_domctl() stub returns -EOPNOTSUPP
changes in v2:
 - rename CONFIG_MEM_ACCESS -> CONFIG_VM_EVENT
 - tags
---
 xen/arch/arm/Makefile  |  4 ++--
 xen/arch/arm/vsmc.c|  3 ++-
 xen/arch/x86/Kconfig   |  2 +-
 xen/common/Makefile|  4 ++--
 xen/include/xen/monitor.h  |  9 +
 xen/include/xen/vm_event.h | 14 +++---
 6 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 3bd5125e64..4837ad467a 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,7 +40,7 @@ obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-$(CONFIG_LLC_COLORING) += llc-coloring.o
 obj-$(CONFIG_VM_EVENT) += mem_access.o
 obj-y += mm.o
-obj-y += monitor.o
+obj-$(CONFIG_VM_EVENT) += monitor.o
 obj-y += p2m.o
 obj-y += platform.o
 obj-y += platform_hypercall.o
@@ -66,7 +66,7 @@ obj-$(CONFIG_VGICV2) += vgic-v2.o
 obj-$(CONFIG_GICV3) += vgic-v3.o
 obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
 endif
-obj-y += vm_event.o
+obj-$(CONFIG_VM_EVENT) += vm_event.o
 obj-y += vtimer.o
 obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o
 obj-y += vsmc.o
diff --git a/xen/arch/arm/vsmc.c b/xen/arch/arm/vsmc.c
index e253865b6c..6081f14ed0 100644
--- a/xen/arch/arm/vsmc.c
+++ b/xen/arch/arm/vsmc.c
@@ -330,7 +330,8 @@ void do_trap_smc(struct cpu_user_regs *regs, const union 
hsr hsr)
 }
 
 /* If monitor is enabled, let it handle the call. */
-if ( current->domain->arch.monitor.privileged_call_enabled )
+if ( IS_ENABLED(CONFIG_VM_EVENT) &&
+ current->domain->arch.monitor.privileged_call_enabled )
 rc = monitor_smc();
 
 if ( rc == 1 )
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 6e41bc0fb4..f086799594 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -350,7 +350,7 @@ endif
 
 config MEM_PAGING
bool "Xen memory paging support (UNSUPPORTED)" if UNSUPPORTED
-   depends on HVM
+   depends on HVM && VM_EVENT
 
 config MEM_SHARING
bool "Xen memory sharing support (UNSUPPORTED)" if UNSUPPORTED
diff --git a/xen/common/Makefile b/xen/common/Makefile
index b71d4b3efa..ac23120d7d 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -54,7 +54,7 @@ obj-y += timer.o
 obj-$(CONFIG_TRACEBUFFER) += trace.o
 obj-y += version.o
 obj-y += virtual_region.o
-obj-y += vm_event.o
+obj-$(CONFIG_VM_EVENT) += vm_event.o
 obj-$(CONFIG_HAS_VMAP) += vmap.o
 obj-y += vsprintf.o
 obj-y += wait.o
@@ -68,7 +68,7 @@ obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o memory.o 
multicall.o xlat.o
 
 ifneq ($(CONFIG_PV_SHIM_EXCLUSIVE),y)
 obj-y += domctl.o
-obj-y += monitor.o
+obj-$(CONFIG_VM_EVENT) += monitor.o
 obj-y += sysctl.o
 endif
 
diff --git a/xen/include/xen/monitor.h b/xen/include/xen/monitor.h
index 713d54f7c1..634b6cd2a1 100644
--- a/xen/include/xen/monitor.h
+++ b/xen/include/xen/monitor.h
@@ -27,8 +27,17 @@
 struct domain;
 struct xen_domctl_monitor_op;
 
+#ifdef CONFIG_VM_EVENT
 int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop);
 void monitor_guest_request(void);
+#else
+static inline int monitor_domctl(struct domain *d,
+ struct xen_domctl_monitor_op *mop)
+{
+return -EOPNOTSUPP;
+}
+static inline void monitor_guest_request(void) {}
+#endif
 
 int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req);
 
diff --git a/xen/include/xen/vm_event.h b/xen/include/xen/vm_event.h
index 9a86358b42..268c85fc4f 100644
--- a/xen/include/xen/vm_event.h
+++ b/xen/include/xen/vm_event.h
@@ -50,9 +50,6 @@ struct vm_event_domain
 unsigned int last_vcpu_wake_up;
 };
 
-/* Clean up on domain destruction */
-void vm_event_cleanup(struct domain *d);
-
 /* Returns whether a ring has been set up */
 bool vm_event_check_ring(struct vm_event_domain *ved);
 
@@ -88,7 +85,18 @@ void vm_event_cancel_slot(struct domain *d, struct 
vm_event_domain *ved);
 void vm_event_put_request(struct domain *d, struct vm_event_domain *ved,
   vm_event_request_t *req);
 
+#ifdef CONFIG_VM_EVENT
+/* Clean up on domain destruction */
+void vm_event_cleanup(struct domain *d);
 int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec);
+#else
+static inline void vm_event_cleanup(struct domain *d) {}
+static inline int vm_event_domctl(struct domain *d,
+  struct xen_domctl_vm_event_op *vec)
+{
+return -EINVAL;
+}
+#endif
 
 void vm_event_vcpu_pause(struct vcpu *v);
 void vm_event_vcpu_unpause(struct

[PATCH v2] tools: Mark ACPI SDTs as NVS in the PVH build path

2025-03-11 Thread Alejandro Vallejo

Commit cefeffc7e583 marked ACPI tables as NVS in the hvmloader path
because SeaBIOS may otherwise just mark it as RAM. There is, however,
yet another reason to do it even in the PVH path. Xen's incarnation of
AML relies on having access to some ACPI tables (e.g: _STA of Processor
objects relies on reading the processor online bit in its MADT entry)

This is problematic if the OS tries to reclaim ACPI memory for page
tables as it's needed for runtime and can't be reclaimed after the OSPM
is up and running.

Fixes: de6d188a519f("hvmloader: flip "ACPI data" to "ACPI NVS" type for ACPI 
table region)"
Signed-off-by: Alejandro Vallejo 
---
v1->v2:
  * Copy explanatory comment in hvmloader/e820.c to its libxl_x86.c counterpart

---
 tools/firmware/hvmloader/e820.c |  4 
 tools/libs/light/libxl_x86.c| 17 -
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index c490a0bc790c..86d39544e887 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -210,6 +210,10 @@ int build_e820_table(struct e820entry *e820,
  * space reuse by an ACPI unaware / buggy bootloader, option ROM, etc.
  * before an ACPI OS takes control. This is possible due to the fact that
  * ACPI NVS memory is explicitly described as non-reclaimable in ACPI spec.
+ *
+ * Furthermore, Xen relies on accessing ACPI tables from within the AML
+ * code exposed to guests. So Xen's ACPI tables are not, in general,
+ * reclaimable.
  */
 
 if ( acpi_enabled )
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index a3164a3077fe..2ba96d12e595 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -737,12 +737,27 @@ static int domain_construct_memmap(libxl__gc *gc,
 nr++;
 }
 
+/*
+ * Mark populated reserved memory that contains ACPI tables as ACPI NVS.
+ * That should help the guest to treat it correctly later: e.g. pass to
+ * the next kernel on kexec.
+ *
+ * Using NVS type instead of a regular one helps to prevent potential
+ * space reuse by an ACPI unaware / buggy bootloader, option ROM, etc.
+ * before an ACPI OS takes control. This is possible due to the fact that
+ * ACPI NVS memory is explicitly described as non-reclaimable in ACPI spec.
+ *
+ * Furthermore, Xen relies on accessing ACPI tables from within the AML
+ * code exposed to guests. So Xen's ACPI tables are not, in general,
+ * reclaimable.
+ */
+
 for (i = 0; i < MAX_ACPI_MODULES; i++) {
 if (dom->acpi_modules[i].length) {
 e820[nr].addr = dom->acpi_modules[i].guest_addr_out & ~(page_size 
- 1);
 e820[nr].size = dom->acpi_modules[i].length +
 (dom->acpi_modules[i].guest_addr_out & (page_size - 1));
-e820[nr].type = E820_ACPI;
+e820[nr].type = E820_NVS;
 nr++;
 }
 }
-- 
2.48.1

[PATCH v3 2/3] x86:monitor: control monitor.c build with CONFIG_VM_EVENT option

2025-03-11 Thread Sergiy Kibrik

Replace more general CONFIG_HVM option with CONFIG_VM_EVENT which is more
relevant and specific to monitoring. This is only to clarify at build level
to which subsystem this file belongs.

No functional change here, as VM_EVENT depends on HVM.

Acked-by: Jan Beulich 
Signed-off-by: Sergiy Kibrik 
---
 xen/arch/x86/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index c763f80b0b..f59c9665fd 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -49,7 +49,7 @@ obj-$(CONFIG_PV) += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
 obj-y += mm.o x86_64/mm.o
-obj-$(CONFIG_HVM) += monitor.o
+obj-$(CONFIG_VM_EVENT) += monitor.o
 obj-y += mpparse.o
 obj-y += nmi.o
 obj-y += numa.o
-- 
2.25.1

Re: [RFC PATCH v3 1/7] xen/arm: add generic SCI subsystem

2025-03-11 Thread Jan Beulich

On 11.03.2025 12:16, Grygorii Strashko wrote:
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -526,6 +526,12 @@ S:   Supported
>  F:   xen/arch/arm/include/asm/tee/
>  F:   xen/arch/arm/tee/
>  
> +SCI MEDIATORS
> +M:   Oleksii Moisieiev 
> +S:   Supported
> +F:   xen/arch/arm/sci
> +F:   xen/include/asm-arm/sci
> +
>  TOOLSTACK
>  M:   Anthony PERARD 
>  S:   Supported

Please can you respect alphabetical sorting in this file?

> @@ -851,6 +852,18 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> u_domctl)
>  case XEN_DOMCTL_deassign_device:
>  case XEN_DOMCTL_get_device_group:
>  ret = iommu_do_domctl(op, d, u_domctl);
> +
> +if ( ret >= 0 || (ret == -EOPNOTSUPP) || (ret == -ENODEV) )
> +{
> +/*
> + * TODO: RFC
> + * This change will allow to pass DT nodes/devices to
> + * XEN_DOMCTL_assign_device OP using xl.cfg:"dtdev" property even
> + * if those DT nodes/devices even are not behind IOMMU (or IOMMU
> + * is disabled) without failure.
> + */
> +ret = sci_do_domctl(op, d, u_domctl);
> +}
>  break;

Despite the comment I fear I don't understand what you're trying to do here.
In any event you may not clobber the original "ret", if all you do is some
"add-on".

> --- a/xen/include/asm-generic/device.h
> +++ b/xen/include/asm-generic/device.h
> @@ -18,6 +18,7 @@ enum device_class
>  DEVICE_IOMMU,
>  DEVICE_INTERRUPT_CONTROLLER,
>  DEVICE_PCI_HOSTBRIDGE,
> +DEVICE_ARM_SCI,
>  /* Use for error */
>  DEVICE_UNKNOWN,

This is a generic header - I don't see how anything arch-specific could
validly end up here.

> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -327,6 +327,8 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
>  #define XEN_DOMCTL_CONFIG_TEE_OPTEE 1
>  #define XEN_DOMCTL_CONFIG_TEE_FFA   2
>  
> +#define XEN_DOMCTL_CONFIG_ARM_SCI_NONE  0
> +
>  struct xen_arch_domainconfig {
>  /* IN/OUT */
>  uint8_t gic_version;
> @@ -350,6 +352,8 @@ struct xen_arch_domainconfig {
>   *
>   */
>  uint32_t clock_frequency;
> +/* IN */
> +uint8_t arm_sci_type;
>  };

You're not re-using a pre-existing padding field, so I don't see how you
can get away without bumping XEN_DOMCTL_INTERFACE_VERSION.

Jan

Re: [PATCH 1/3] x86/P2M: synchronize fast and slow paths of p2m_get_page_from_gfn()

2025-03-11 Thread Jan Beulich

On 10.03.2025 15:53, Roger Pau Monné wrote:
> On Wed, Feb 26, 2025 at 12:52:27PM +0100, Jan Beulich wrote:
>> Handling of both grants and foreign pages was different between the two
>> paths.
>>
>> While permitting access to grants would be desirable, doing so would
>> require more involved handling; undo that for the time being. In
>> particular the page reference obtained would prevent the owning domain
>> from changing e.g. the page's type (after the grantee has released the
>> last reference of the grant). Instead perhaps another reference on the
>> grant would need obtaining. Which in turn would require determining
>> which grant that was.
>>
>> Foreign pages in any event need permitting on both paths.
>>
>> Introduce a helper function to be used on both paths, such that
>> respective checking differs in just the extra "to be unshared" condition
>> on the fast path.
>>
>> While there adjust the sanity check for foreign pages: Don't leak the
>> reference on release builds when on a debug build the assertion would
>> have triggered. (Thanks to Roger for the suggestion.)
>>
>> Fixes: 80ea7af17269 ("x86/mm: Introduce get_page_from_gfn()")
>> Fixes: 50fe6e737059 ("pvh dom0: add and remove foreign pages")
>> Fixes: cbbca7be4aaa ("x86/p2m: make p2m_get_page_from_gfn() handle grant 
>> case correctly")
>> Signed-off-by: Jan Beulich 
> 
> Just a couple of nits below (with a reply to your RFC).
> 
>> ---
>> RFC: While the helper could take const struct domain * as first
>>  parameter, for a P2M function it seemed more natural to have it
>>  take const struct p2m_domain *.
>>
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -328,12 +328,45 @@ void p2m_put_gfn(struct p2m_domain *p2m,
>>  gfn_unlock(p2m, gfn_x(gfn), 0);
>>  }
>>  
>> +static struct page_info *get_page_from_mfn_and_type(
>> +const struct p2m_domain *p2m, mfn_t mfn, p2m_type_t t)
> 
> Re your RFC: since it's a static function, just used for
> p2m_get_page_from_gfn(), I would consider passing a domain instead of
> a p2m_domain, as the solely usage of p2m is to obtain the domain.

Okay, will do.

>> +{
>> +struct page_info *page;
>> +
>> +if ( !mfn_valid(mfn) )
>> +return NULL;
>> +
>> +page = mfn_to_page(mfn);
>> +
>> +if ( p2m_is_ram(t) )
> 
> Should this be a likely() to speed up the common successful path?

Well. Andrew's general advice looks to be to avoid likely() / unlikely()
in almost all situations. Therefore unless he positively indicates that
in a case like this using likely() is acceptable, I'd rather stay away
from adding that.

docs/process/coding-best-practices.pandoc could certainly do with some
rough guidelines on when adding these two is acceptable (or even
desirable). Right now to me it is largely unclear when their use is
deemed okay; it certainly doesn't match anymore what I was told some
20 years ago when I started working on Linux.

Re: [PATCH v2 1/5] xen/arm: Create tee command line parameter

2025-03-11 Thread Jan Beulich

On 10.03.2025 15:50, Bertrand Marquis wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2651,6 +2651,20 @@ Specify the per-cpu trace buffer size in pages.
>  
>  Flag to enable TSC deadline as the APIC timer mode.
>  
> +### tee
> +> `= `

This wants an arch restriction, like we have for other command line options
supported only by one arch.

Jan

> +Specify the TEE mediator to be probed and use.
> +
> +The default behaviour is to probe all supported TEEs supported by Xen and use
> +the first one successfully probed. When this parameter is passed, Xen will
> +probe only the TEE mediator passed as argument and boot will fail if this
> +mediator is not properly probed or if the requested TEE is not supported by
> +Xen.
> +
> +This parameter can be set to `optee` of `ffa` if the corresponding mediators
> +are compiled in.
> +
>  ### tevt_mask
>  > `= `
>

[PATCH v4 0/3] x86/pci: reduce PCI accesses

2025-03-11 Thread Roger Pau Monne

Hello,

Patches 1 and 2 are new bugfixes in this version.  Patch 3 is still
mostly as v3, just with the extra logic to ensure vmx_pi_update_irte()
correctness.

Thanks, Roger.

Roger Pau Monne (3):
  x86/vmx: fix posted interrupts usage of msi_desc->msg field
  x86/hvm: check return code of hvm_pi_update_irte when binding
  x86/iommu: avoid MSI address and data writes if IRT index hasn't
changed

 xen/arch/x86/hpet.c  |  6 +-
 xen/arch/x86/hvm/vmx/vmx.c   | 20 +++-
 xen/arch/x86/include/asm/msi.h   |  2 +-
 xen/arch/x86/msi.c   | 11 ++-
 xen/drivers/passthrough/amd/iommu_intr.c |  4 ++--
 xen/drivers/passthrough/vtd/intremap.c   |  4 +++-
 xen/drivers/passthrough/x86/hvm.c| 10 +-
 xen/include/xen/iommu.h  |  6 ++
 8 files changed, 51 insertions(+), 12 deletions(-)

-- 
2.48.1

Re: [PATCH v2] xen/page_alloc: Simplify domain_adjust_tot_pages

2025-03-11 Thread Roger Pau Monné

On Tue, Mar 04, 2025 at 11:10:00AM +, Alejandro Vallejo wrote:
> The logic has too many levels of indirection and it's very hard to
> understand it its current form. Split it between the corner case where
> the adjustment is bigger than the current claim and the rest to avoid 5
> auxiliary variables.
> 
> Add a functional change to prevent negative adjustments from
> re-increasing the claim. This has the nice side effect of avoiding
> taking the heap lock here on every free.
> 
> While at it, fix incorrect field name in nearby comment.
> 
> Signed-off-by: Alejandro Vallejo 

Acked-by: Roger Pau Monné 

I think it would be nice to also ensure that once the domain is
finished building (maybe when it's unpaused for the first
time?) d->outstanding_pages is set to 0.  IMO the claim system was
designed to avoid races during domain building, and shouldn't be used
once the domain is already running.

Thanks, Roger.

[PATCH v2 3/7] xen/arm: add support for PCI child bus

2025-03-11 Thread Mykyta Poturai

From: Oleksandr Andrushchenko 

PCI host bridges often have different ways to access the root and child
bus configuration spaces. One of the examples is Designware's host bridge
and its multiple clones [1].

Linux kernel implements this by instantiating a child bus when device
drivers provide not only the usual pci_ops to access ECAM space (this is
the case for the generic host bridge), but also means to access the child
bus which has a dedicated configuration space and own implementation for
accessing the bus, e.g. child_ops.

For Xen it is not feasible to fully implement PCI bus infrastructure as
Linux kernel does, but still child bus can be supported.

Add support for the PCI child bus which includes the following changes:
- introduce bus mapping functions depending on SBDF
- assign bus start and end for the child bus and re-configure the same for
  the parent (root) bus
- make pci_find_host_bridge be aware of multiple busses behind the same bridge
- update pci_host_bridge_mappings, so it also doesn't map to guest the memory
  spaces belonging to the child bus
- make pci_host_common_probe accept one more pci_ops structure for the child bus
- install MMIO handlers for the child bus
- re-work vpci_mmio_{write|read} with parent and child approach in mind

[1] https://elixir.bootlin.com/linux/v5.15/source/drivers/pci/controller/dwc

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* fixed compilation issues
---
 xen/arch/arm/include/asm/pci.h  | 14 -
 xen/arch/arm/pci/ecam.c | 17 --
 xen/arch/arm/pci/pci-access.c   | 37 +++--
 xen/arch/arm/pci/pci-host-common.c  | 86 +++--
 xen/arch/arm/pci/pci-host-generic.c |  2 +-
 xen/arch/arm/pci/pci-host-zynqmp.c  |  2 +-
 xen/arch/arm/vpci.c | 83 ++--
 7 files changed, 192 insertions(+), 49 deletions(-)

diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h
index e1f63d75e3..0012c2ae9e 100644
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -69,6 +69,9 @@ struct pci_host_bridge {
 struct pci_config_window* cfg;   /* Pointer to the bridge config window */
 const struct pci_ops *ops;
 void *priv;  /* Private data of the bridge. */
+/* Child bus */
+struct pci_config_window* child_cfg;
+const struct pci_ops *child_ops;
 };
 
 struct pci_ops {
@@ -97,15 +100,20 @@ struct pci_ecam_ops {
 /* Default ECAM ops */
 extern const struct pci_ecam_ops pci_generic_ecam_ops;
 
-struct pci_host_bridge *pci_host_common_probe(struct dt_device_node *dev,
-  const struct pci_ecam_ops *ops,
-  size_t priv_sz);
+struct pci_host_bridge *
+pci_host_common_probe(struct dt_device_node *dev,
+  const struct pci_ecam_ops *ops,
+  const struct pci_ecam_ops *child_ops,
+  size_t priv_sz);
 int pci_generic_config_read(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
 uint32_t reg, uint32_t len, uint32_t *value);
 int pci_generic_config_write(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
  uint32_t reg, uint32_t len, uint32_t value);
 void __iomem *pci_ecam_map_bus(struct pci_host_bridge *bridge,
pci_sbdf_t sbdf, uint32_t where);
+void __iomem *pci_ecam_map_bus_generic(const struct pci_config_window *cfg,
+   const struct pci_ecam_ops *ops,
+   pci_sbdf_t sbdf, uint32_t where);
 bool pci_ecam_need_p2m_hwdom_mapping(struct domain *d,
  struct pci_host_bridge *bridge,
  uint64_t addr);
diff --git a/xen/arch/arm/pci/ecam.c b/xen/arch/arm/pci/ecam.c
index 3987f96b01..5486881c5c 100644
--- a/xen/arch/arm/pci/ecam.c
+++ b/xen/arch/arm/pci/ecam.c
@@ -20,12 +20,10 @@
 /*
  * Function to implement the pci_ops->map_bus method.
  */
-void __iomem *pci_ecam_map_bus(struct pci_host_bridge *bridge,
-   pci_sbdf_t sbdf, uint32_t where)
+void __iomem *pci_ecam_map_bus_generic(const struct pci_config_window *cfg,
+   const struct pci_ecam_ops *ops,
+   pci_sbdf_t sbdf, uint32_t where)
 {
-const struct pci_config_window *cfg = bridge->cfg;
-const struct pci_ecam_ops *ops =
-container_of(bridge->ops, const struct pci_ecam_ops, pci_ops);
 unsigned int devfn_shift = ops->bus_shift - 8;
 void __iomem *base;
 unsigned int busn = sbdf.bus;
@@ -39,6 +37,15 @@ void __iomem *pci_ecam_map_bus(struct pci_host_bridge 
*bridge,
 return base + (sbdf.devfn << devfn_shift) + where;
 }
 
+void __iomem *pci_ecam_map_bus(struct pci_host_bridge *bridge,
+   pci_sbdf_t sbdf, uint

Re: [PATCH v3 2/2] x86/iommu: avoid MSI address and data writes if IRT index hasn't changed

2025-03-11 Thread Roger Pau Monné

On Tue, Mar 11, 2025 at 08:35:35AM +0100, Jan Beulich wrote:
> On 10.03.2025 16:42, Roger Pau Monné wrote:
> > On Mon, Mar 10, 2025 at 11:51:09AM +0100, Jan Beulich wrote:
> >> On 10.03.2025 10:55, Roger Pau Monne wrote:
> >>> Attempt to reduce the MSI entry writes, and the associated checking 
> >>> whether
> >>> memory decoding and MSI-X is enabled for the PCI device, when the MSI data
> >>> hasn't changed.
> >>>
> >>> When using Interrupt Remapping the MSI entry will contain an index into
> >>> the remapping table, and it's in such remapping table where the MSI vector
> >>> and destination CPU is stored.  As such, when using interrupt remapping,
> >>> changes to the interrupt affinity shouldn't result in changes to the MSI
> >>> entry, and the MSI entry update can be avoided.
> >>>
> >>> Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
> >>> address fields have changed, and thus need writing to the device 
> >>> registers.
> >>> Such signaling is done by returning 1 from the function.  Otherwise
> >>> returning 0 means no update of the MSI fields, and thus no write
> >>> required.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Reviewed-by: Jan Beulich 
> >> with two purely cosmetic suggestions and an only loosely related question 
> >> below.
> >>
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -415,7 +415,9 @@ static int cf_check vmx_pi_update_irte(const struct 
> >>> vcpu *v,
> >>>  
> >>>  ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
> >>>  
> >>> -return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> >>> +rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> >>> +
> >>> +return rc < 0 ? rc : 0;
> >>
> >> Only tangential here, but: Why does this function have a return type of
> >> non-void, when neither caller cares?
> > 
> > I'm afraid there's more wrong in vmx_pi_update_irte() that I've just
> > spotted afterwards.
> > 
> > vmx_pi_update_irte() passes to iommu_update_ire_from_msi() the
> > msi_desc->msg field, but that field is supposed to always contain the
> > non-translated MSI data, as you correctly pointed out in v1 it's
> > consumed by dump_msi().  vmx_pi_update_irte() using msi_desc->msg to
> > store the translated MSI effectively breaks dump_msi().
> 
> Oh, indeed - it violates what write_msi_msg() specifically checks by
> an assertion.

Indeed.  I have a pre-patch to fix this.

> > Also vmx_pi_update_irte() relies on the IRT index never changing, as
> > otherwise it's missing any logic to update the MSI registers.
> 
> Isn't this a valid assumption here? msi_msg_to_remap_entry() will only
> ever allocate a new index if none was previously allocated.

Yes, but I think it needs to be accounted for, I've now switched this
to:

rc = iommu_update_ire_from_msi(msi_desc, &msg);
if ( rc > 0 )
{
/*
 * Callers of vmx_pi_update_irte() won't propagate the updated MSI
 * fields to the hardware, must assert there are no changes.
 */
ASSERT_UNREACHABLE();
rc = -EILSEQ;
}

return rc;

Which I think better reflects the expectations of the function.

> >>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> >>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> >>> @@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg(
> >>> get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
> >>>  }
> >>>  
> >>> -return 0;
> >>> +return !fresh ? 0 : 1;
> >>>  }
> >>
> >> Simply
> >>
> >> return fresh;
> >>
> >> ?
> >>
> >>> @@ -546,7 +546,7 @@ int cf_check amd_iommu_msi_msg_update_ire(
> >>>  rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> >>>  &msi_desc->remap_index,
> >>>  msg, &data);
> >>> -if ( !rc )
> >>> +if ( rc > 0 )
> >>>  {
> >>>  for ( i = 1; i < nr; ++i )
> >>>  msi_desc[i].remap_index = msi_desc->remap_index + i;
> >>> --- a/xen/drivers/passthrough/vtd/intremap.c
> >>> +++ b/xen/drivers/passthrough/vtd/intremap.c
> >>> @@ -506,6 +506,7 @@ static int msi_msg_to_remap_entry(
> >>>  unsigned int index, i, nr = 1;
> >>>  unsigned long flags;
> >>>  const struct pi_desc *pi_desc = msi_desc->pi_desc;
> >>> +bool alloc = false;
> >>>  
> >>>  if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
> >>>  nr = msi_desc->msi.nvec;
> >>> @@ -529,6 +530,7 @@ static int msi_msg_to_remap_entry(
> >>>  index = alloc_remap_entry(iommu, nr);
> >>>  for ( i = 0; i < nr; ++i )
> >>>  msi_desc[i].remap_index = index + i;
> >>> +alloc = true;
> >>>  }
> >>>  else
> >>>  index = msi_desc->remap_index;
> >>> @@ -601,7 +603,7 @@ static int msi_msg_to_remap_entry(
> >>>  unmap_vtd_domain_page(iremap_entries);
> >>>  spin_unlock_irqrestore(&iommu->intremap.lock, flags);
> >>>  
> >>> -return 0;
> >

Re: [PATCH 1/2] xen/arm: Improve handling of nr_spis

2025-03-11 Thread Bertrand Marquis




> On 11 Mar 2025, at 10:59, Orzel, Michal  wrote:
> 
> 
> 
> On 11/03/2025 10:30, Bertrand Marquis wrote:
>> 
>> 
>> Hi Michal,
>> 
>>> On 11 Mar 2025, at 10:04, Michal Orzel  wrote:
>>> 
>>> At the moment, we print a warning about max number of IRQs supported by
>>> GIC bigger than vGIC only for hardware domain. This check is not hwdom
>>> special, and should be made common. Also, in case of user not specifying
>>> nr_spis for dom0less domUs, we should take into account max number of
>>> IRQs supported by vGIC if it's smaller than for GIC.
>>> 
>>> Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
>>> Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
>>> IRQs comparison common.
>>> 
>>> Signed-off-by: Michal Orzel 
>>> ---
>>> xen/arch/arm/dom0less-build.c   | 2 +-
>>> xen/arch/arm/domain_build.c | 9 ++---
>>> xen/arch/arm/gic.c  | 3 +++
>>> xen/arch/arm/include/asm/vgic.h | 3 +++
>>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
>>> index 31f31c38da3f..9a84fee94119 100644
>>> --- a/xen/arch/arm/dom0less-build.c
>>> +++ b/xen/arch/arm/dom0less-build.c
>>> @@ -1018,7 +1018,7 @@ void __init create_domUs(void)
>>>{
>>>int vpl011_virq = GUEST_VPL011_SPI;
>>> 
>>> -d_cfg.arch.nr_spis = gic_number_lines() - 32;
>>> +d_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 
>>> 32;
>> 
>> I would suggest to introduce a static inline gic_nr_spis in a gic header ...
> Why GIC and not vGIC? This is domain's nr_spis, so vGIC.

yes vGIC sorry.

> But then, why static inline if the value does not change and is domain 
> agnostic?
> I struggle to find a good name for this macro. Maybe (in vgic.h):
> #define vgic_def_nr_spis (min(gic_number_lines(), VGIC_MAX_IRQS) - 32)
> to denote default nr_spis if not set by the user?

Yes that would work. My point is to prevent to have 2 definitions in 2 different
source file and a risk to forget to update one and not the other (let say if 
some
day we change 32 in 64).

> 
>> 
>>> 
>>>/*
>>> * The VPL011 virq is GUEST_VPL011_SPI, unless direct-map is
>>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>>> index 7cc141ef75e9..b99c4e3a69bf 100644
>>> --- a/xen/arch/arm/domain_build.c
>>> +++ b/xen/arch/arm/domain_build.c
>>> @@ -2371,13 +2371,8 @@ void __init create_dom0(void)
>>> 
>>>/* The vGIC for DOM0 is exactly emulating the hardware GIC */
>>>dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
>>> -/*
>>> - * Xen vGIC supports a maximum of 992 interrupt lines.
>>> - * 32 are substracted to cover local IRQs.
>>> - */
>>> -dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 
>>> 32;
>>> -if ( gic_number_lines() > 992 )
>>> -printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded.\n");
>>> +/* 32 are substracted to cover local IRQs */
>>> +dom0_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;
>> 
>> and reuse it here to make sure the value used is always the same.
>> 
>>>dom0_cfg.arch.tee_type = tee_get_type();
>>>dom0_cfg.max_vcpus = dom0_max_vcpus();
>>> 
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index acf61a4de373..e80fe0ca2421 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -251,6 +251,9 @@ void __init gic_init(void)
>>>panic("Failed to initialize the GIC drivers\n");
>>>/* Clear LR mask for cpu0 */
>>>clear_cpu_lr_mask();
>>> +
>>> +if ( gic_number_lines() > VGIC_MAX_IRQS )
>>> +printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded\n");
>> 
>> I am a bit unsure with this one.
>> If this is the case, it means any gicv2 or gicv3 init detected an impossible 
>> value and
>> any usage of gic_number_lines would be using an impossible value.
> Why impossible? GIC can support up to 1020 IRQs. Our vGIC can support up to 
> 992
> IRQs.

Maybe unsupported is a better wording, my point is that it could lead to non 
working system
if say something uses irq 1000.

> 
>> 
>> Shouldn't this somehow result in a panic as the gic detection was wrong ?
>> Do you think we can continue to work safely if this value is over 
>> VGIC_MAX_IRQS.
>> There are other places using gic_number_lines like in irq.c.
> I can add a panic, sure. This would be a change in behavior because we used 
> this
> check for hwdom unconditionally. I'd need others opinion for this one.

ok.

Cheers
Bertrand

> 
> ~Michal

[PATCH v3] x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

2025-03-11 Thread Roger Pau Monne

The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
Linux pre-6.14 is unconditionally attempting to read the MSR without a
safe MSR accessor, and since Xen doesn't allow access to it Linux reports
the following error:

unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0x8101d19f 
(xen_do_read_msr+0x7f/0xa0)
Call Trace:
 xen_read_msr+0x1e/0x30
 amd_get_mmconfig_range+0x2b/0x80
 quirk_amd_mmconfig_area+0x28/0x100
 pnp_fixup_device+0x39/0x50
 __pnp_add_device+0xf/0x150
 pnp_add_device+0x3d/0x100
 pnpacpi_add_device_handler+0x1f9/0x280
 acpi_ns_get_device_callback+0x104/0x1c0
 acpi_ns_walk_namespace+0x1d0/0x260
 acpi_get_devices+0x8a/0xb0
 pnpacpi_init+0x50/0x80
 do_one_initcall+0x46/0x2e0
 kernel_init_freeable+0x1da/0x2f0
 kernel_init+0x16/0x1b0
 ret_from_fork+0x30/0x50
 ret_from_fork_asm+0x1b/0x30

Such access is conditional to the presence of a device with PnP ID
"PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
function.  Note that prior to commit 3fac3734c43a MSR accesses when running
as a PV guest would always use the safe variant, and thus silently handle
the #GP.

Fix by allowing access to the MSR on AMD systems for the hardware domain.

Write attempts to the MSR will still result in #GP for all domain types.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - #GP on domU accesses.

Changes since v1:
 - Expand commit message to note which device triggers the MSR read.
---
 xen/arch/x86/msr.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 1550fd9ec9f3..2cd46b6c8afa 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -318,6 +318,14 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
*val)
 *val = 0;
 break;
 
+case MSR_FAM10H_MMIO_CONF_BASE:
+if ( !is_hardware_domain(d) ||
+ !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+ rdmsr_safe(msr, *val) )
+goto gp_fault;
+
+break;
+
 case MSR_VIRT_SPEC_CTRL:
 if ( !cp->extd.virt_ssbd )
 goto gp_fault;
-- 
2.48.1

Re: [PATCH v3] x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

2025-03-11 Thread Jan Beulich

On 11.03.2025 13:22, Roger Pau Monne wrote:
> The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> Linux pre-6.14 is unconditionally attempting to read the MSR without a
> safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> the following error:
> 
> unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0x8101d19f 
> (xen_do_read_msr+0x7f/0xa0)
> Call Trace:
>  xen_read_msr+0x1e/0x30
>  amd_get_mmconfig_range+0x2b/0x80
>  quirk_amd_mmconfig_area+0x28/0x100
>  pnp_fixup_device+0x39/0x50
>  __pnp_add_device+0xf/0x150
>  pnp_add_device+0x3d/0x100
>  pnpacpi_add_device_handler+0x1f9/0x280
>  acpi_ns_get_device_callback+0x104/0x1c0
>  acpi_ns_walk_namespace+0x1d0/0x260
>  acpi_get_devices+0x8a/0xb0
>  pnpacpi_init+0x50/0x80
>  do_one_initcall+0x46/0x2e0
>  kernel_init_freeable+0x1da/0x2f0
>  kernel_init+0x16/0x1b0
>  ret_from_fork+0x30/0x50
>  ret_from_fork_asm+0x1b/0x30
> 
> Such access is conditional to the presence of a device with PnP ID
> "PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
> function.  Note that prior to commit 3fac3734c43a MSR accesses when running
> as a PV guest would always use the safe variant, and thus silently handle
> the #GP.
> 
> Fix by allowing access to the MSR on AMD systems for the hardware domain.
> 
> Write attempts to the MSR will still result in #GP for all domain types.
> 
> Signed-off-by: Roger Pau Monné 

Just to record that I'm also fine with it this way:
Reviewed-by: Jan Beulich 

Jan

[PATCH v2 07/16] exec/exec-all: remove dependency on cpu.h

2025-03-11 Thread Pierrick Bouvier

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/exec/exec-all.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index dd5c40f2233..19b0eda44a7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -20,7 +20,6 @@
 #ifndef EXEC_ALL_H
 #define EXEC_ALL_H
 
-#include "cpu.h"
 #if defined(CONFIG_USER_ONLY)
 #include "exec/cpu_ldst.h"
 #endif
-- 
2.39.5

[PATCH v3 1/3] xen: kconfig: rename MEM_ACCESS -> VM_EVENT

2025-03-11 Thread Sergiy Kibrik

Use more generic CONFIG_VM_EVENT name throughout Xen code instead of
CONFIG_MEM_ACCESS. This reflects the fact that vm_event is a higher level
feature, with mem_access & monitor depending on it.

CC: Jan Beulich 
Suggested-by: Tamas K Lengyel 
Acked-by: Tamas K Lengyel 
Signed-off-by: Sergiy Kibrik 
---
changes in v3:
 - squash with automation patch
 - tags & minor addition of blanks
---
 automation/gitlab-ci/build.yaml | 2 +-
 xen/arch/arm/Makefile   | 2 +-
 xen/arch/arm/configs/tiny64_defconfig   | 2 +-
 xen/arch/arm/include/asm/mem_access.h   | 4 ++--
 xen/arch/ppc/configs/ppc64_defconfig| 2 +-
 xen/arch/riscv/configs/tiny64_defconfig | 2 +-
 xen/arch/x86/mm/Makefile| 2 +-
 xen/common/Kconfig  | 2 +-
 xen/common/Makefile | 2 +-
 xen/common/domctl.c | 2 +-
 xen/include/xen/mem_access.h| 6 +++---
 xen/include/xsm/dummy.h | 2 +-
 xen/include/xsm/xsm.h   | 4 ++--
 xen/xsm/dummy.c | 2 +-
 xen/xsm/flask/hooks.c   | 4 ++--
 15 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 034d6d9c3a..b6383c4fc8 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -744,7 +744,7 @@ debian-12-riscv64-gcc:
   CONFIG_EXPERT=y
   CONFIG_GRANT_TABLE=n
   CONFIG_LIVEPATCH=n
-  CONFIG_MEM_ACCESS=n
+  CONFIG_VM_EVENT=n
   CONFIG_QEMU_PLATFORM=y
   CONFIG_XSM=n
 
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index fb0948f067..3bd5125e64 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -38,7 +38,7 @@ obj-y += irq.o
 obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-$(CONFIG_LLC_COLORING) += llc-coloring.o
-obj-$(CONFIG_MEM_ACCESS) += mem_access.o
+obj-$(CONFIG_VM_EVENT) += mem_access.o
 obj-y += mm.o
 obj-y += monitor.o
 obj-y += p2m.o
diff --git a/xen/arch/arm/configs/tiny64_defconfig 
b/xen/arch/arm/configs/tiny64_defconfig
index cc6d93f2f8..469a1eb9f9 100644
--- a/xen/arch/arm/configs/tiny64_defconfig
+++ b/xen/arch/arm/configs/tiny64_defconfig
@@ -5,7 +5,7 @@ CONFIG_ARM=y
 # Architecture Features
 #
 # CONFIG_GICV3 is not set
-# CONFIG_MEM_ACCESS is not set
+# CONFIG_VM_EVENT is not set
 # CONFIG_SBSA_VUART_CONSOLE is not set
 
 #
diff --git a/xen/arch/arm/include/asm/mem_access.h 
b/xen/arch/arm/include/asm/mem_access.h
index abac8032fc..d42f28e8b7 100644
--- a/xen/arch/arm/include/asm/mem_access.h
+++ b/xen/arch/arm/include/asm/mem_access.h
@@ -37,7 +37,7 @@ static inline bool p2m_mem_access_sanity_check(struct domain 
*d)
  * Send mem event based on the access. Boolean return value indicates if trap
  * needs to be injected into guest.
  */
-#ifdef CONFIG_MEM_ACCESS
+#ifdef CONFIG_VM_EVENT
 bool p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec);
 
 struct page_info*
@@ -58,7 +58,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long 
flag,
 return NULL;
 }
 
-#endif /*CONFIG_MEM_ACCESS*/
+#endif /* CONFIG_VM_EVENT */
 #endif /* _ASM_ARM_MEM_ACCESS_H */
 
 /*
diff --git a/xen/arch/ppc/configs/ppc64_defconfig 
b/xen/arch/ppc/configs/ppc64_defconfig
index 4924d881a2..d6aaf772e7 100644
--- a/xen/arch/ppc/configs/ppc64_defconfig
+++ b/xen/arch/ppc/configs/ppc64_defconfig
@@ -1,6 +1,6 @@
 # CONFIG_GRANT_TABLE is not set
 # CONFIG_SPECULATIVE_HARDEN_ARRAY is not set
-# CONFIG_MEM_ACCESS is not set
+# CONFIG_VM_EVENT is not set
 
 CONFIG_PPC64=y
 CONFIG_DEBUG=y
diff --git a/xen/arch/riscv/configs/tiny64_defconfig 
b/xen/arch/riscv/configs/tiny64_defconfig
index bb3ae26a44..2399f7b918 100644
--- a/xen/arch/riscv/configs/tiny64_defconfig
+++ b/xen/arch/riscv/configs/tiny64_defconfig
@@ -1,6 +1,6 @@
 # CONFIG_BOOT_TIME_CPUPOOLS is not set
 # CONFIG_GRANT_TABLE is not set
-# CONFIG_MEM_ACCESS is not set
+# CONFIG_VM_EVENT is not set
 # CONFIG_COVERAGE is not set
 # CONFIG_LIVEPATCH is not set
 # CONFIG_XSM is not set
diff --git a/xen/arch/x86/mm/Makefile b/xen/arch/x86/mm/Makefile
index 0345388359..960f6e8409 100644
--- a/xen/arch/x86/mm/Makefile
+++ b/xen/arch/x86/mm/Makefile
@@ -4,7 +4,7 @@ obj-$(CONFIG_HVM) += hap/
 obj-$(CONFIG_ALTP2M) += altp2m.o
 obj-$(CONFIG_HVM) += guest_walk_2.o guest_walk_3.o guest_walk_4.o
 obj-$(CONFIG_SHADOW_PAGING) += guest_walk_4.o
-obj-$(CONFIG_MEM_ACCESS) += mem_access.o
+obj-$(CONFIG_VM_EVENT) += mem_access.o
 obj-$(CONFIG_MEM_PAGING) += mem_paging.o
 obj-$(CONFIG_MEM_SHARING) += mem_sharing.o
 obj-$(CONFIG_HVM) += nested.o
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 6166327f4d..a6aa2c5c14 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -92,7 +92,7 @@ config HAS_VMAP
 config MEM_ACCESS_ALWAYS_ON
bool
 
-config MEM_ACCESS
+config VM_EVENT
def_bool MEM_ACCESS_ALWAYS_ON
prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
depends

[PATCH v2 5/7] xen/arm: rcar4: add delay after programming ATU

2025-03-11 Thread Mykyta Poturai

From: Volodymyr Babchuk 

For some reason, we need a delay before accessing ATU region after
we programmed it. Otherwise, we'll get erroneous TLP.

There is a code below, which should do this in proper way, by polling
CTRL2 register, but according to documentation, hardware does not
change this ATU_ENABLE bit at all.

Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* rebased
---
 xen/arch/arm/pci/pci-designware.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/pci/pci-designware.c 
b/xen/arch/arm/pci/pci-designware.c
index 6ab03cf9b0..def2c12d63 100644
--- a/xen/arch/arm/pci/pci-designware.c
+++ b/xen/arch/arm/pci/pci-designware.c
@@ -194,6 +194,11 @@ static void dw_pcie_prog_outbound_atu_unroll(struct 
pci_host_bridge *pci,
 dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2,
  PCIE_ATU_ENABLE);
 
+/*
+ * HACK: We need to delay there, because the next code does not
+ * work as expected on S4
+ */
+mdelay(1);
 /*
  * Make sure ATU enable takes effect before any subsequent config
  * and I/O accesses.
-- 
2.34.1

[PATCH v2 7/7] xen/arm: rcar4: program ATU to accesses to all functions

2025-03-11 Thread Mykyta Poturai

From: Volodymyr Babchuk 

According to ATU documentation, bits [18:16] of accessed memory
address correspond to a function number. This is somewhat similar to
ECAM, but with huge holes between regions.

We can use this to minimize number of ATU re-programmings: configure
ATU to access BDF with F=0 and adjust memory address with function
number.

Taking into account the previous patch, that optimizes ATU
reprogramming by skipping call to __dw_pcie_prog_outbound_atu() if we
already configured pci_address, we can be sure that accesses to all
functions of one device will not trigger ATU reprogramming at all.

Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* rebased
---
 xen/arch/arm/pci/pci-designware.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/pci/pci-designware.c 
b/xen/arch/arm/pci/pci-designware.c
index cec52cf81a..fec8c7296d 100644
--- a/xen/arch/arm/pci/pci-designware.c
+++ b/xen/arch/arm/pci/pci-designware.c
@@ -297,7 +297,7 @@ void __iomem *dw_pcie_child_map_bus(struct pci_host_bridge 
*bridge,
 uint32_t busdev;
 
 busdev = PCIE_ATU_BUS(sbdf.bus) | PCIE_ATU_DEV(PCI_SLOT(sbdf.devfn)) |
-PCIE_ATU_FUNC(PCI_FUNC(sbdf.devfn));
+PCIE_ATU_FUNC(0);
 
 /* FIXME: Parent is the root bus, so use PCIE_ATU_TYPE_CFG0. */
 dw_pcie_prog_outbound_atu(bridge, PCIE_ATU_REGION_INDEX1,
@@ -305,7 +305,7 @@ void __iomem *dw_pcie_child_map_bus(struct pci_host_bridge 
*bridge,
   bridge->child_cfg->phys_addr,
   busdev, bridge->child_cfg->size);
 
-return bridge->child_cfg->win + where;
+return bridge->child_cfg->win + ((uint32_t)sbdf.fn << 16) + where;
 }
 
 int dw_pcie_child_config_read(struct pci_host_bridge *bridge,
-- 
2.34.1

[PATCH v2 2/7] xen/arm: make pci_host_common_probe return the bridge

2025-03-11 Thread Mykyta Poturai

From: Oleksandr Andrushchenko 

Some of the PCI host bridges require additional processing during the
probe phase. For that they need to access struct bridge of the probed
host, so return pointer to the new bridge from pci_host_common_probe.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Mykyta Poturai 
---
v1->v2:
* no change
---
 xen/arch/arm/include/asm/pci.h  |  8 +---
 xen/arch/arm/pci/pci-host-common.c  | 12 ++--
 xen/arch/arm/pci/pci-host-generic.c |  2 +-
 xen/arch/arm/pci/pci-host-zynqmp.c  |  2 +-
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h
index 44344ac8c1..e1f63d75e3 100644
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -17,6 +17,8 @@
 
 #ifdef CONFIG_HAS_PCI
 
+#include 
+
 #include 
 
 #define pci_to_dev(pcidev) (&(pcidev)->arch.dev)
@@ -95,9 +97,9 @@ struct pci_ecam_ops {
 /* Default ECAM ops */
 extern const struct pci_ecam_ops pci_generic_ecam_ops;
 
-int pci_host_common_probe(struct dt_device_node *dev,
-  const struct pci_ecam_ops *ops,
-  size_t priv_sz);
+struct pci_host_bridge *pci_host_common_probe(struct dt_device_node *dev,
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz);
 int pci_generic_config_read(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
 uint32_t reg, uint32_t len, uint32_t *value);
 int pci_generic_config_write(struct pci_host_bridge *bridge, pci_sbdf_t sbdf,
diff --git a/xen/arch/arm/pci/pci-host-common.c 
b/xen/arch/arm/pci/pci-host-common.c
index be7e6c3510..e4e05d1176 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -208,9 +208,9 @@ static int pci_bus_find_domain_nr(struct dt_device_node 
*dev)
 return domain;
 }
 
-int pci_host_common_probe(struct dt_device_node *dev,
-  const struct pci_ecam_ops *ops,
-  size_t priv_sz)
+struct pci_host_bridge *pci_host_common_probe(struct dt_device_node *dev,
+  const struct pci_ecam_ops *ops,
+  size_t priv_sz)
 {
 struct pci_host_bridge *bridge;
 struct pci_config_window *cfg;
@@ -222,7 +222,7 @@ int pci_host_common_probe(struct dt_device_node *dev,
 
 bridge = pci_alloc_host_bridge();
 if ( !bridge )
-return -ENOMEM;
+return ERR_PTR(-ENOMEM);
 
 /* Parse and map our Configuration Space windows */
 cfg = gen_pci_init(dev, ops);
@@ -257,7 +257,7 @@ int pci_host_common_probe(struct dt_device_node *dev,
 
 pci_add_host_bridge(bridge);
 
-return 0;
+return bridge;
 
 err_priv:
 xfree(bridge->priv);
@@ -265,7 +265,7 @@ err_priv:
 err_exit:
 xfree(bridge);
 
-return err;
+return ERR_PTR(err);
 }
 
 /*
diff --git a/xen/arch/arm/pci/pci-host-generic.c 
b/xen/arch/arm/pci/pci-host-generic.c
index cc4bc70684..dde6a79a8e 100644
--- a/xen/arch/arm/pci/pci-host-generic.c
+++ b/xen/arch/arm/pci/pci-host-generic.c
@@ -29,7 +29,7 @@ static const struct dt_device_match __initconstrel 
gen_pci_dt_match[] =
 static int __init pci_host_generic_probe(struct dt_device_node *dev,
  const void *data)
 {
-return pci_host_common_probe(dev, &pci_generic_ecam_ops, 0);
+return PTR_RET(pci_host_common_probe(dev, &pci_generic_ecam_ops, 0));
 }
 
 DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI_HOSTBRIDGE)
diff --git a/xen/arch/arm/pci/pci-host-zynqmp.c 
b/xen/arch/arm/pci/pci-host-zynqmp.c
index 985a43c516..c796447ac4 100644
--- a/xen/arch/arm/pci/pci-host-zynqmp.c
+++ b/xen/arch/arm/pci/pci-host-zynqmp.c
@@ -47,7 +47,7 @@ static const struct dt_device_match __initconstrel 
nwl_pcie_dt_match[] =
 static int __init pci_host_generic_probe(struct dt_device_node *dev,
  const void *data)
 {
-return pci_host_common_probe(dev, &nwl_pcie_ops, 0);
+return PTR_RET(pci_host_common_probe(dev, &nwl_pcie_ops, 0));
 }
 
 DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI_HOSTBRIDGE)
-- 
2.34.1

[ImageBuilder][PATCH v2] uboot-script-gen: handle reserved memory regions

2025-03-11 Thread Luca Miccio

Currently, the uboot-script-gen does not account for reserved memory
regions in the device tree. This oversight can lead to scenarios where
one or more boot modules overlap with a reserved region. As a result,
Xen will always crash upon detecting this overlap. However, the crash
will be silent (without output) if earlyprintk is not enabled, which is
the default setting at the moment.

To address this issue, add a function that iterates over the
reserved-memory nodes if any and populates an array. This array will
be used later to calculate the load address for any given file.

Signed-off-by: Luca Miccio 
---
v1->v2
 - added local
 - check if reserved node exists
 - handle fdtget errors
 - use numeric check for duplicates
---
 scripts/uboot-script-gen | 84 ++--
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/scripts/uboot-script-gen b/scripts/uboot-script-gen
index db2c011..edc6a4c 100755
--- a/scripts/uboot-script-gen
+++ b/scripts/uboot-script-gen
@@ -468,6 +468,67 @@ function device_tree_editing()
 fi
 }
 
+function fill_reserved_spaces_from_dtb()
+{
+if [ ! -f $DEVICE_TREE ]
+then
+echo "File $DEVICE_TREE doesn't exist, exiting";
+cleanup_and_return_err
+fi
+
+# Check if reserved-memory node exists
+if ! fdtget -l $DEVICE_TREE /reserved-memory &> /dev/null
+then
+return
+fi
+
+local addr_cells=$(fdtget -t x $DEVICE_TREE /reserved-memory 
'#address-cells' 2> /dev/null\
+|| fdtget -t x $DEVICE_TREE / '#address-cells' 2> 
/dev/null)
+local size_cells=$(fdtget -t x $DEVICE_TREE /reserved-memory '#size-cells' 
2> /dev/null\
+|| fdtget -t x $DEVICE_TREE / '#size-cells' 2> /dev/null)
+
+if [ -z "$addr_cells" ] || [ -z "$size_cells" ]; then
+echo "Could not find #address-cells or #size-cells properties for 
reserved-memory node in $DEVICE_TREE"
+cleanup_and_return_err
+fi
+
+for node in $(fdtget -l $DEVICE_TREE /reserved-memory); do
+local reg_values=($(fdtget -t x $DEVICE_TREE /reserved-memory/$node 
reg))
+local i=0
+for ((i=0; i<${#reg_values[@]}; i+=addr_cells+size_cells)); do
+local addr=0
+local size=0
+local j=0
+
+for ((j=0; j

[RFC PATCH v3 2/7] xen/arm: scmi-smc: update to be used under sci subsystem

2025-03-11 Thread Grygorii Strashko

The introduced SCI (System Control Interface) subsystem provides unified
interface to integrate in Xen SCI drivers which adds support for ARM
firmware (EL3, SCP) based software interfaces (like SCMI) that are used in
system management. The SCI subsystem allows to add drivers for different FW
interfaces or have different drivers for the same FW interface (for example,
SCMI with different transports).

This patch updates SCMI over SMC calls handling layer, introduced by
commit 3e322bef8bc0 ("xen/arm: firmware: Add SCMI over SMC calls handling
layer"), to be SCI driver:
- convert to DT device;
- convert to SCI Xen interface.

Signed-off-by: Grygorii Strashko 
---
 xen/arch/arm/firmware/Kconfig| 13 ++-
 xen/arch/arm/firmware/scmi-smc.c | 93 +++-
 xen/arch/arm/include/asm/firmware/scmi-smc.h | 41 -
 xen/arch/arm/vsmc.c  |  5 +-
 xen/include/public/arch-arm.h|  1 +
 5 files changed, 64 insertions(+), 89 deletions(-)
 delete mode 100644 xen/arch/arm/include/asm/firmware/scmi-smc.h

diff --git a/xen/arch/arm/firmware/Kconfig b/xen/arch/arm/firmware/Kconfig
index fc7918c7fc56..02d7b600317f 100644
--- a/xen/arch/arm/firmware/Kconfig
+++ b/xen/arch/arm/firmware/Kconfig
@@ -8,9 +8,18 @@ config ARM_SCI
 
 menu "Firmware Drivers"
 
+choice
+   prompt "ARM SCI driver type"
+   default ARM_SCI_NONE
+   help
+   Choose which ARM SCI driver to enable.
+
+config ARM_SCI_NONE
+   bool "none"
+
 config SCMI_SMC
bool "Forward SCMI over SMC calls from hwdom to EL3 firmware"
-   default y
+   select ARM_SCI
help
  This option enables basic awareness for SCMI calls using SMC as
  doorbell mechanism and Shared Memory for transport ("arm,scmi-smc"
@@ -18,4 +27,6 @@ config SCMI_SMC
  firmware node is used to trap and forward corresponding SCMI SMCs
  to firmware running at EL3, for calls coming from the hardware domain.
 
+endchoice
+
 endmenu
diff --git a/xen/arch/arm/firmware/scmi-smc.c b/xen/arch/arm/firmware/scmi-smc.c
index 33473c04b181..188bd659513b 100644
--- a/xen/arch/arm/firmware/scmi-smc.c
+++ b/xen/arch/arm/firmware/scmi-smc.c
@@ -9,6 +9,7 @@
  * Copyright 2024 NXP
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -16,12 +17,11 @@
 #include 
 #include 
 
+#include 
 #include 
-#include 
 
 #define SCMI_SMC_ID_PROP   "arm,smc-id"
 
-static bool __ro_after_init scmi_enabled;
 static uint32_t __ro_after_init scmi_smc_id;
 
 /*
@@ -41,14 +41,11 @@ static bool scmi_is_valid_smc_id(uint32_t fid)
  *
  * Returns true if SMC was handled (regardless of response), false otherwise.
  */
-bool scmi_handle_smc(struct cpu_user_regs *regs)
+static bool scmi_handle_smc(struct cpu_user_regs *regs)
 {
 uint32_t fid = (uint32_t)get_user_reg(regs, 0);
 struct arm_smccc_res res;
 
-if ( !scmi_enabled )
-return false;
-
 if ( !scmi_is_valid_smc_id(fid) )
 return false;
 
@@ -78,49 +75,45 @@ bool scmi_handle_smc(struct cpu_user_regs *regs)
 return true;
 }
 
-static int __init scmi_check_smccc_ver(void)
+static int scmi_smc_domain_init(struct domain *d,
+struct xen_domctl_createdomain *config)
 {
-if ( smccc_ver < ARM_SMCCC_VERSION_1_1 )
-{
-printk(XENLOG_WARNING
-   "SCMI: No SMCCC 1.1 support, SCMI calls forwarding disabled\n");
-return -ENOSYS;
-}
+if ( !is_hardware_domain(d) )
+return 0;
 
+d->arch.sci_enabled = true;
+printk(XENLOG_DEBUG "SCMI: %pd init\n", d);
 return 0;
 }
 
-static int __init scmi_dt_init_smccc(void)
+static void scmi_smc_domain_destroy(struct domain *d)
 {
-static const struct dt_device_match scmi_ids[] __initconst =
-{
-/* We only support "arm,scmi-smc" binding for now */
-DT_MATCH_COMPATIBLE("arm,scmi-smc"),
-{ /* sentinel */ },
-};
-const struct dt_device_node *scmi_node;
-int ret;
+if ( !is_hardware_domain(d) )
+return;
 
-/* If no SCMI firmware node found, fail silently as it's not mandatory */
-scmi_node = dt_find_matching_node(NULL, scmi_ids);
-if ( !scmi_node )
-return -EOPNOTSUPP;
+printk(XENLOG_DEBUG "SCMI: %pd destroy\n", d);
+}
 
-ret = dt_property_read_u32(scmi_node, SCMI_SMC_ID_PROP, &scmi_smc_id);
-if ( !ret )
+static int __init scmi_check_smccc_ver(void)
+{
+if ( smccc_ver < ARM_SMCCC_VERSION_1_1 )
 {
-printk(XENLOG_ERR "SCMI: No valid \"%s\" property in \"%s\" DT node\n",
-   SCMI_SMC_ID_PROP, scmi_node->full_name);
-return -ENOENT;
+printk(XENLOG_WARNING
+   "SCMI: No SMCCC 1.1 support, SCMI calls forwarding disabled\n");
+return -ENOSYS;
 }
 
-scmi_enabled = true;
-
 return 0;
 }
 
+static const struct sci_mediator_ops scmi_smc_ops = {
+.handle_call = scmi_handle_smc,
+.domain_init = scmi_smc_domain_init,
+.domain_destroy = scmi_

[RFC PATCH v3 0/7] xen/arm: scmi: introduce SCI SCMI SMC multi-agent support

2025-03-11 Thread Grygorii Strashko

Hi,

This is respin of RFCv2 series from Oleksii Moisieiev [1] which was send pretty 
long time ago (2022),
with the main intention is to resume this discussion in public and gather more 
opinions.

Hence the code was previously sent long time ago there are pretty high number 
of changes,
including rebase on newer Xen version 4.20.0-rc2, which already contains some 
basic SCMI SMC driver
introduced by Andrei Cherechesu, commit 3e322bef8bc0 ("xen/arm: firmware: Add 
SCMI over SMC calls
handling layer").

Patch 1 "xen/arm: add generic SCI subsystem"
- rebased and refactored
- introduced DEVICE_ARM_SCI DT device class and used for SCI drivers probing 
instead of custom,
  linker sections based implementation.
- added SCI API for Dom0 DT handling, instead of manipulating with ARM arch 
dom0 code directly.
- TODO: RFC changes in XEN_DOMCTL_assign_device OP processing

Patch 2 "xen/arm: scmi-smc: update to be used under sci subsystem"
- update driver introduced by commit 3e322bef8bc0 ("xen/arm: firmware: Add SCMI 
over SMC calls
handling layer") be used under sci subsystem.
- no functional changes in general

Patch 3 "xen/arm: scmi-smc: passthrough SCMI SMC to guest domain
This is new change which allows passthrough SCMI SMC, single agent interface to 
guest domain
cover use case "thin Dom0 with guest domain, which serves as Driver domain".
See patch commit message for full description.

Patch 4 - xen/arm: scmi: introduce SCI SCMI SMC multi-agent driver
- added "xen,scmi-secondary-agents" property in "chosen" to inform SCI SCMI 
multi-agent driver
  about available agents and their configuration. It defines  to 
 map.
  This option is Xen specific as Xen is the only one entry in the system which 
need to know
  about SCMI multi-agent support and configuration.
- each guest using SCMI should be configured with SCMI agent_id, so SCMI
  FW can implement Agent-specific permission policy.
  -- dom0: dom0_scmi_agent_id= in Xen command line option
  -- toolstack: arm_sci = "type=scmi_smc_multiagent,agent_id="
  -- dom0less: todo: "xen,sci_type", "xen,sci_agent_id" properties in 
"xen,domain" nodes.
- factored out SCMI generic definitions (re-usable)
- factored out SCMI shmem code (re-usable)
- the SCMI passthrough configuration for guest domains is similar to any other 
HW passthrough cfg.

Patches 5-7
- no major changes, except to follow rebase and changes in previous patches

Regarding patches 5-7 I'd like to rise a question and I, personally, feel very 
skeptical doing any
kind of SCMI DT nodes generation as from toolstack as from Xen.
1) SCMI is no differ as any other HW MFD device, and HW passthrough 
configuration works for it in
   the same way.
2) if toolstack generates DT then dom0less case might need it also, but this 
means more code in Xen,
   so, with certification in mind, it means more overhead requirements, docs 
and testing.
   In my opinion if something can be done outside "kernel" - it should.
   So better invest in tools (imagebuilder, lopper, etc.) instead.
3) Hence SCMI DT bindings are pretty complex the corresponding guest DT nodes 
can't be generated
   from scratch - the user still need to add scmi node, protocols and protocols 
subnodes in the
   partial device tree, at least empty. But stop, not exactly empty - the 
properties like
   "#clock-cells" need to be added to avoid DTC warnings. Such behavior is 
rather confusing than
   helpful.
4) Exposing the Host Device tree in Dom0 is another questionable thing for 
toolstack SCMI DT
   generation. It consumes 128K of memory on Renesas r8a779g0-whitehawk.
5) No needs for additional public API (XEN_DOMCTL_get_sci_info, 
GUEST_SCI_SHMEM_BASE..) if dropped 

Code can be found at:
https://github.com/GrygiriiS/xen/commits/master_v4h_sci_v13_dt_gen

[1] RFC v2:
https://patchwork.kernel.org/project/xen-devel/cover/cover.1644341635.git.oleksii_moisie...@epam.com/

SCMI spec:
https://developer.arm.com/documentation/den0056/e/?lang=en

SCMI bindings:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/firmware/arm,scmi.yaml
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/access-controllers/access-controllers.yaml

Reference EL3 FW:
RPI5: https://github.com/xen-troops/arm-trusted-firmware/commits/rpi5_dev/
Renesas v4h: 
https://github.com/GrygiriiS/arm-trusted-firmware/commits/rcar_gen4_v2.7_v4x-scmi_upd/

base-commit: c3f5d1bb40b5 ("automation/cirrus-ci: introduce FreeBSD randconfig 
builds")

Grygorii Strashko (2):
  xen/arm: scmi-smc: update to be used under sci subsystem
  xen/arm: scmi-smc: passthrough SCMI SMC to domain, single agent

Oleksii Moisieiev (5):
  xen/arm: add generic SCI subsystem
  xen/arm: scmi: introduce SCI SCMI SMC multi-agent driver
  libs: libxenhypfs - handle blob properties
  xen/arm: Export host device-tree to hypfs
  xen/arm: scmi: generate scmi dt node for DomUs

 MAINTAINERS

[RFC PATCH v3 5/7] libs: libxenhypfs - handle blob properties

2025-03-11 Thread Grygorii Strashko

From: Oleksii Moisieiev 

libxenhypfs will return blob properties as is. This output can be used
to retrieve information from the hypfs. Caller is responsible for
parsing property value.

Signed-off-by: Oleksii Moisieiev 
Reviewed-by: Volodymyr Babchuk 
---
 tools/libs/hypfs/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/libs/hypfs/core.c b/tools/libs/hypfs/core.c
index 52b30db8d777..d09bba7d8c86 100644
--- a/tools/libs/hypfs/core.c
+++ b/tools/libs/hypfs/core.c
@@ -307,8 +307,6 @@ char *xenhypfs_read(xenhypfs_handle *fshdl, const char 
*path)
 errno = EISDIR;
 break;
 case xenhypfs_type_blob:
-errno = EDOM;
-break;
 case xenhypfs_type_string:
 ret_buf = buf;
 buf = NULL;
-- 
2.34.1

Re: [PATCH 1/3] x86/P2M: synchronize fast and slow paths of p2m_get_page_from_gfn()

2025-03-11 Thread Roger Pau Monné

On Tue, Mar 11, 2025 at 09:44:18AM +0100, Jan Beulich wrote:
> On 10.03.2025 15:53, Roger Pau Monné wrote:
> > On Wed, Feb 26, 2025 at 12:52:27PM +0100, Jan Beulich wrote:
> >> Handling of both grants and foreign pages was different between the two
> >> paths.
> >>
> >> While permitting access to grants would be desirable, doing so would
> >> require more involved handling; undo that for the time being. In
> >> particular the page reference obtained would prevent the owning domain
> >> from changing e.g. the page's type (after the grantee has released the
> >> last reference of the grant). Instead perhaps another reference on the
> >> grant would need obtaining. Which in turn would require determining
> >> which grant that was.
> >>
> >> Foreign pages in any event need permitting on both paths.
> >>
> >> Introduce a helper function to be used on both paths, such that
> >> respective checking differs in just the extra "to be unshared" condition
> >> on the fast path.
> >>
> >> While there adjust the sanity check for foreign pages: Don't leak the
> >> reference on release builds when on a debug build the assertion would
> >> have triggered. (Thanks to Roger for the suggestion.)
> >>
> >> Fixes: 80ea7af17269 ("x86/mm: Introduce get_page_from_gfn()")
> >> Fixes: 50fe6e737059 ("pvh dom0: add and remove foreign pages")
> >> Fixes: cbbca7be4aaa ("x86/p2m: make p2m_get_page_from_gfn() handle grant 
> >> case correctly")
> >> Signed-off-by: Jan Beulich 
> > 
> > Just a couple of nits below (with a reply to your RFC).
> > 
> >> ---
> >> RFC: While the helper could take const struct domain * as first
> >>  parameter, for a P2M function it seemed more natural to have it
> >>  take const struct p2m_domain *.
> >>
> >> --- a/xen/arch/x86/mm/p2m.c
> >> +++ b/xen/arch/x86/mm/p2m.c
> >> @@ -328,12 +328,45 @@ void p2m_put_gfn(struct p2m_domain *p2m,
> >>  gfn_unlock(p2m, gfn_x(gfn), 0);
> >>  }
> >>  
> >> +static struct page_info *get_page_from_mfn_and_type(
> >> +const struct p2m_domain *p2m, mfn_t mfn, p2m_type_t t)
> > 
> > Re your RFC: since it's a static function, just used for
> > p2m_get_page_from_gfn(), I would consider passing a domain instead of
> > a p2m_domain, as the solely usage of p2m is to obtain the domain.
> 
> Okay, will do.
> 
> >> +{
> >> +struct page_info *page;
> >> +
> >> +if ( !mfn_valid(mfn) )
> >> +return NULL;
> >> +
> >> +page = mfn_to_page(mfn);
> >> +
> >> +if ( p2m_is_ram(t) )
> > 
> > Should this be a likely() to speed up the common successful path?
> 
> Well. Andrew's general advice looks to be to avoid likely() / unlikely()
> in almost all situations. Therefore unless he positively indicates that
> in a case like this using likely() is acceptable, I'd rather stay away
> from adding that.

Oh, OK.  My suggestion was based on the use of unlikely below.  I
assume the later unlikely() in the patch is just inheritance from the
previous code.

> docs/process/coding-best-practices.pandoc could certainly do with some
> rough guidelines on when adding these two is acceptable (or even
> desirable). Right now to me it is largely unclear when their use is
> deemed okay; it certainly doesn't match anymore what I was told some
> 20 years ago when I started working on Linux.

I would be happy to have some guidance there, as I think I've been a
bit erratic with suggestions about how to use them.

Thanks, Roger.

[PATCH 1/2] xen/arm: Improve handling of nr_spis

2025-03-11 Thread Michal Orzel

At the moment, we print a warning about max number of IRQs supported by
GIC bigger than vGIC only for hardware domain. This check is not hwdom
special, and should be made common. Also, in case of user not specifying
nr_spis for dom0less domUs, we should take into account max number of
IRQs supported by vGIC if it's smaller than for GIC.

Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
IRQs comparison common.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/dom0less-build.c   | 2 +-
 xen/arch/arm/domain_build.c | 9 ++---
 xen/arch/arm/gic.c  | 3 +++
 xen/arch/arm/include/asm/vgic.h | 3 +++
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 31f31c38da3f..9a84fee94119 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1018,7 +1018,7 @@ void __init create_domUs(void)
 {
 int vpl011_virq = GUEST_VPL011_SPI;
 
-d_cfg.arch.nr_spis = gic_number_lines() - 32;
+d_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;
 
 /*
  * The VPL011 virq is GUEST_VPL011_SPI, unless direct-map is
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 7cc141ef75e9..b99c4e3a69bf 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2371,13 +2371,8 @@ void __init create_dom0(void)
 
 /* The vGIC for DOM0 is exactly emulating the hardware GIC */
 dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
-/*
- * Xen vGIC supports a maximum of 992 interrupt lines.
- * 32 are substracted to cover local IRQs.
- */
-dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32;
-if ( gic_number_lines() > 992 )
-printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded.\n");
+/* 32 are substracted to cover local IRQs */
+dom0_cfg.arch.nr_spis = min(gic_number_lines(), VGIC_MAX_IRQS) - 32;
 dom0_cfg.arch.tee_type = tee_get_type();
 dom0_cfg.max_vcpus = dom0_max_vcpus();
 
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index acf61a4de373..e80fe0ca2421 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -251,6 +251,9 @@ void __init gic_init(void)
 panic("Failed to initialize the GIC drivers\n");
 /* Clear LR mask for cpu0 */
 clear_cpu_lr_mask();
+
+if ( gic_number_lines() > VGIC_MAX_IRQS )
+printk(XENLOG_WARNING "Maximum number of vGIC IRQs exceeded\n");
 }
 
 void send_SGI_mask(const cpumask_t *cpumask, enum gic_sgi sgi)
diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h
index e309dca1ad01..c549e5840bfa 100644
--- a/xen/arch/arm/include/asm/vgic.h
+++ b/xen/arch/arm/include/asm/vgic.h
@@ -329,6 +329,9 @@ extern void vgic_check_inflight_irqs_pending(struct vcpu *v,
  */
 #define vgic_num_irqs(d)((d)->arch.vgic.nr_spis + 32)
 
+/* Maximum number of IRQs supported by vGIC */
+#define VGIC_MAX_IRQS 992U
+
 /*
  * Allocate a guest VIRQ
  *  - spi == 0 => allocate a PPI. It will be the same on every vCPU
-- 
2.25.1

[PATCH 0/2] arm: better handling of nr_spis

2025-03-11 Thread Michal Orzel

Refer:
https://lore.kernel.org/xen-devel/20250306220343.203047-6-jason.andr...@amd.com/T/#mc15ab00940d5964b18b3d6092869dae6f85af1dc

Michal Orzel (2):
  xen/arm: Improve handling of nr_spis
  tools/arm: Reject configuration with incorrect nr_spis value

 docs/man/xl.cfg.5.pod.in| 13 +
 tools/libs/light/libxl_arm.c|  6 ++
 xen/arch/arm/dom0less-build.c   |  2 +-
 xen/arch/arm/domain_build.c |  9 ++---
 xen/arch/arm/gic.c  |  3 +++
 xen/arch/arm/include/asm/vgic.h |  3 +++
 6 files changed, 20 insertions(+), 16 deletions(-)

-- 
2.25.1

1 2 3 >

1 - 100 of 273 matches

Mail list logo