Re: Re: Memory corruption bug with Xen PV Dom0 and BOSS-S1 RAID card

2025-02-19 Thread Paweł Srokosz
Hello,

> So the issue doesn't happen on debug=y builds? That's unexpected.  I would
> expect the opposite, that some code in Linux assumes that pfn + 1 == mfn +
> 1, and hence breaks when the relation is reversed.

It was also surprising for me but I think the key thing is that debug=y
causes whole mapping to be reversed so each PFN lands on completely different
MFN e.g. MFN=0x130 is mapped to PFN=0x20e50c in ndebug, but in debug
it's mapped to PFN=0x5F. I guess that's why I can't reproduce the
problem.

> Can you see if you can reproduce with dom0-iommu=strict in the Xen command
> line?

Unfortunately, it doesn't help. But I have few more observations.

Firstly, I checked the "xen-mfndump dump-m2p" output and found that misread
blocks are mapped to suspiciously round MFNs. I have different versions of
Xen and Linux kernel on each machine and I see some coincidence.

I'm writing few huge files without Xen to ensure that they have been written
correctly (because under Xen both read and writeback is affected). Then I'm
booting to Xen, memory-mapping the files and reading each page. I see that when 
block is corrupted, it is mapped on round MFN e.g. pfn=0x5095d9/mfn=0x160, 
another on pfn=0x4095d9/mfn=0x150 etc.

On another machine with different Linux/Xen version these faults appear on
pfn=0x20e50c/mfn=0x130, pfn=0x30e50c/mfn=0x140 etc.

I also noticed that during read of page that is mapped to
pfn=0x20e50c/mfn=0x130, I'm getting these faults from DMAR:

```
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 12
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 121000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 126000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 128000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 129000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 12a000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:65:00.0] fault addr 12c000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
```

and every time I'm dropping the cache and reading this region, I'm getting
DMAR faults on few random addresses from 12-12f000 range (I guess 
MFNs 0x120-12f). MFNs 0x120-0x12000ff are not mapped to any PFN in
Dom0 (based on xen-mfndump output.). 

On the other hand, I'm not getting these DMAR faults while reading other 
regions.
Also I can't trigger the bug with reversed Dom0 mapping, even if I fill the page
cache with reads.

Thank you,
Paweł



Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 07:13, Valentin Schneider wrote:
>> Maybe I missed part of the discussion though. Is VMEMMAP your only
>> concern? I would have guessed that the more generic vmalloc()
>> functionality would be harder to pin down.
> Urgh, that'll teach me to send emails that late - I did indeed mean the
> vmalloc() range, not at all VMEMMAP. IIUC *neither* are present in the user
> kPTI page table and AFAICT the page table swap is done before the actual 
> vmap'd
> stack (CONFIG_VMAP_STACK=y) gets used.

OK, so rewriting your question... ;)

> So what if the vmalloc() range *isn't* in the CR3 tree when a CPU is
> executing in userspace?

The LDT and maybe the PEBS buffers are the only implicit supervisor
accesses to vmalloc()'d memory that I can think of. But those are both
handled specially and shouldn't ever get zapped while in use. The LDT
replacement has its own IPIs separate from TLB flushing.

But I'm actually not all that worried about accesses while actually
running userspace. It's that "danger zone" in the kernel between entry
and when the TLB might have dangerous garbage in it.

BTW, I hope this whole thing is turned off on 32-bit. There, we can
actually take and handle faults on the vmalloc() area. If you get one of
those faults in your "danger zone", it'll start running page fault code
which will branch out to god-knows-where and certainly isn't noinstr.



[PATCH v3 2/3] x86/iommu: account for IOMEM caps when populating dom0 IOMMU page-tables

2025-02-19 Thread Roger Pau Monne
The current code in arch_iommu_hwdom_init() kind of open-codes the same
MMIO permission ranges that are added to the hardware domain ->iomem_caps.
Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to
filter which memory regions should be added to the dom0 IOMMU page-tables.

Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH
dom0, otherwise the internal Xen emulation for those ranges won't work.
This requires adjustments in dom0_setup_permissions().

The call to pvh_setup_mmcfg() in dom0_construct_pvh() must now strictly be
done ahead of setting up dom0 permissions, so take the opportunity to also
put it inside the existing is_hardware_domain() region.

Also the special casing of E820_UNUSABLE regions no longer needs to be done
in arch_iommu_hwdom_init(), as those regions are already blocked in
->iomem_caps and thus would be removed from the rangeset as part of
->iomem_caps processing in arch_iommu_hwdom_init().  The E820_UNUSABLE
regions below 1Mb are not removed from ->iomem_caps, that's a slight
difference for the IOMMU created page-tables, but the aim is to allow
access to the same memory either from the CPU or the IOMMU page-tables.

Since ->iomem_caps already takes into account the domain max paddr, there's
no need to remove any regions past the last address addressable by the
domain, as applying ->iomem_caps would have already taken care of that.

Suggested-by: Jan Beulich 
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Expand commit message re E820_UNUSABLE.
 - Use vpci_mmcfg_deny_access().
 - Remove max() from map_subtract_iomemcap().
 - Use has_vioapic() instead of is_hvm_domain().

Changes since v1:
 - New in this version.
---
 xen/arch/x86/dom0_build.c   | 11 -
 xen/arch/x86/hvm/dom0_build.c   | 14 +++---
 xen/arch/x86/hvm/io.c   |  6 +--
 xen/arch/x86/include/asm/hvm/io.h   |  4 +-
 xen/drivers/passthrough/x86/iommu.c | 67 -
 5 files changed, 49 insertions(+), 53 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index aec596997d5d..a735e3650c28 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -558,7 +558,9 @@ int __init dom0_setup_permissions(struct domain *d)
 for ( i = 0; i < nr_ioapics; i++ )
 {
 mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
-if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+/* If emulating IO-APIC(s) make sure the base address is unmapped. */
+if ( has_vioapic(d) ||
+ !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
 rc |= iomem_deny_access(d, mfn, mfn);
 }
 /* MSI range. */
@@ -599,6 +601,13 @@ int __init dom0_setup_permissions(struct domain *d)
 rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
 }
 
+if ( has_vpci(d) )
+/*
+ * TODO: runtime added MMCFG regions are not checked to make sure they
+ * don't overlap with already mapped regions, thus preventing trapping.
+ */
+rc |= vpci_mmcfg_deny_access(d);
+
 return rc;
 }
 
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index ce5b5c31b318..6a4453103a9a 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1323,6 +1323,13 @@ int __init dom0_construct_pvh(struct boot_info *bi, 
struct domain *d)
 
 if ( is_hardware_domain(d) )
 {
+/*
+ * MMCFG initialization must be performed before setting domain
+ * permissions, as the MCFG areas must not be part of the domain IOMEM
+ * accessible regions.
+ */
+pvh_setup_mmcfg(d);
+
 /*
  * Setup permissions early so that calls to add MMIO regions to the
  * p2m as part of vPCI setup don't fail due to permission checks.
@@ -1335,13 +1342,6 @@ int __init dom0_construct_pvh(struct boot_info *bi, 
struct domain *d)
 }
 }
 
-/*
- * NB: MMCFG initialization needs to be performed before iommu
- * initialization so the iommu code can fetch the MMCFG regions used by the
- * domain.
- */
-pvh_setup_mmcfg(d);
-
 /*
  * Craft dom0 physical memory map and set the paging allocation. This must
  * be done before the iommu initializion, since iommu initialization code
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index db726b38177b..de6ee6c4dd4d 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -363,14 +363,14 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const 
struct domain *d,
 return NULL;
 }
 
-int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset 
*r)
+int __hwdom_init vpci_mmcfg_deny_access(struct domain *d)
 {
 const struct hvm_mmcfg *mmcfg;
 
 list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
 {
-int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
-   PFN_DOWN(mmcfg->add

Re: [PATCH] xen/arm: Create GIC node using the node name from host dt

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Michal Orzel wrote:
> At the moment the GIC node we create for hwdom has a name
> "interrupt-controller". Change it so that we use the same name as the
> GIC node from host device tree. This is done for at least 2 purposes:
> 1) The convention in DT spec is that a node name with "reg" property
> is formed "node-name@unit-address".
> 2) With DT overlay feature, many overlays refer to the GIC node using
> the symbol under __symbols__ that we copy to hwdom 1:1. With the name
> changed, the symbol is no longer valid and requires error prone manual
> change by the user.
> 
> The unit-address part of the node name always refers to the first
> address in the "reg" property which in case of GIC, always refers to
> GICD and hwdom uses host memory layout.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Stefano Stabellini 

While this fix changes behavior for everyone, so it is risky at RC5, it
also fixes bugs with DT overlays, but that is an experimental feature. I
am in two minds whether it should go in right now or not. Maybe I would
wait until 4.20 is out and commit when the tree reopens.


> ---
>  xen/arch/arm/domain_build.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 7b47abade196..e760198d8609 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1615,6 +1615,7 @@ static int __init make_gic_node(const struct domain *d, 
> void *fdt,
>  int res = 0;
>  const void *addrcells, *sizecells;
>  u32 addrcells_len, sizecells_len;
> +const char *name;
>  
>  /*
>   * Xen currently supports only a single GIC. Discard any secondary
> @@ -1628,7 +1629,11 @@ static int __init make_gic_node(const struct domain 
> *d, void *fdt,
>  
>  dt_dprintk("Create gic node\n");
>  
> -res = fdt_begin_node(fdt, "interrupt-controller");
> +/* Use the same name as the GIC node in host device tree */
> +name = strrchr(gic->full_name, '/');
> +name = name ? name + 1 : gic->full_name;
> +
> +res = fdt_begin_node(fdt, name);
>  if ( res )
>  return res;
>  
> -- 
> 2.25.1
> 



Re: [PATCH v3 5/5] docs: add basic CI documentation

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Marek Marczykowski-Górecki wrote:
> Include info how to get access/enable hardware runners and how to select
> individual jobs.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> ---
> new in v3
> Definitely there can be more content here, but lets start somewhere.
> ---
>  docs/index.rst   |  1 +
>  docs/misc/ci.rst | 35 +++
>  2 files changed, 36 insertions(+)
>  create mode 100644 docs/misc/ci.rst
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index 1bb8d02ea357..bd87d736b9c3 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -51,6 +51,7 @@ kind of development environment.
> :maxdepth: 2
>  
> hypervisor-guide/index
> +   misc/ci
>  
>  
>  Unsorted documents
> diff --git a/docs/misc/ci.rst b/docs/misc/ci.rst
> new file mode 100644
> index ..2803574fa2c0
> --- /dev/null
> +++ b/docs/misc/ci.rst
> @@ -0,0 +1,35 @@
> +.. SPDX-License-Identifier: CC-BY-4.0
> +
> +Continuous Integration
> +==
> +
> +Xen Project uses Gitlab-CI for automated testing. Test pipelines for official
> +staging branches are at
> +``_. Developers can
> +schedule test pipelines in their repositories under
> +``_.
> +
> +Hardware runners
> +
> +
> +Some of the tests are using dedicated hardware runners. Those are not 
> available freely, but the access is granted to individual developers. To get 
> access to them, ask on the ``#XenDevel:matrix.org`` Matrix channel.
> +After getting access to relevant runners, few extra changes are necessary in 
> settings of the relevant "xen" gitlab project (under your 
> ``_ namespace):
> +
> +1. Go to Settings -> CI/CD, expand the "Runners" section and enable relevant 
> runners for your project.
> +2. Expand "Variables" section and add ``QUBES_JOBS=true`` variable for Qubes 
> runners, and ``XILINX_JOBS=true`` for Xilinx runners.

Let's not mention XILINX_JOBS=true as Xilinx runners are not generally
available. I can fix on commit.

Reviewed-by: Stefano Stabellini 



> +3. Go to Settings -> Repository, expand "Branch rules" section and add a 
> rule for protected branches - only those branches will get tests on the 
> hardware runners. It's okay to use a pattern for branch name, and it's okay 
> to allow force push.
> +
> +Selecting individual tests
> +**
> +
> +Normally, all build and test jobs are scheduled in a pipeline. When working 
> on a specific patches, it is sometimes useful to run only jobs relevant for 
> the current work - both to save time and to save CI resources. This can be 
> done by seeting ``SELECTED_JOBS_ONLY`` variable when starting the pipeline. 
> The variable holds a regular expression, enclosed with ``/`` that matches 
> jobs to be included. The variable can be set via the gitlab.com web UI or 
> directly when pushing changes to gitlab::
> +
> +   git push -o ci.variable=SELECTED_JOBS_ONLY="/job1|job2/"
> +
> +Note if a test job requires some build job, both need to be included in the 
> regex. For example, ``adl-smoke-x86-64-gcc-debug`` requires 
> ``alpine-3.18-gcc-debug``, so to run just this test the command will look 
> like this::
> +
> +   git push -o 
> ci.variable=SELECTED_JOBS_ONLY="/adl-smoke-x86-64-gcc-debug|alpine-3.18-gcc-debug/"
> +
> +More details at 
> ``_.
> +
> +Alternatively, irrelevant jobs can be removed from respective yaml files in 
> ``automation/gitlab-ci`` by adding temporary commit on top of the branch.
> -- 
> git-series 0.9.1
> 

Re: [PATCH v3 3/5] automation: allow selecting individual jobs via CI variables

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Marek Marczykowski-Górecki wrote:
> Debugging sometimes involves running specific jobs on different
> versions. It's useful to easily avoid running all of the not interesting
> ones (for given case) to save both time and CI resources. Doing so used
> to require changing the yaml files, usually in several places.
> Ease this step by adding SELECTED_JOBS_ONLY variable that takes a regex.
> Note that one needs to satisfy job dependencies on their own (for
> example if a test job needs a build job, that specific build job
> needs to be included too).
> 
> The variable can be specified via Gitlab web UI when scheduling a
> pipeline, but it can be also set when doing git push directly:
> 
> git push -o ci.variable=SELECTED_JOBS_ONLY="/job1|job2/"
> 
> More details at https://docs.gitlab.co.jp/ee/user/project/push_options.html
> 
> The variable needs to include regex for selecting jobs, including
> enclosing slashes.
> A coma/space separated list of jobs to select would be friendlier UX,
> but unfortunately that is not supported:
> https://gitlab.com/gitlab-org/gitlab/-/issues/209904 (note the proposed
> workaround doesn't work for job-level CI_JOB_NAME).
> On the other hand, the regex is more flexible (one can select for
> example all arm32 jobs).
> 
> Signed-off-by: Marek Marczykowski-Górecki 

While I am not super happy about this (it is not your fault, it is due
to the limitations of the Gitlab YAML languange) and I don't think I
would use SELECTED_JOBS_ONLY probably, if you find it useful maybe
others will too. It is not a big maintenance burden. So:

Acked-by: Stefano Stabellini 


> ---
> Changes in v3:
> - include variable in Web UI for starting pipeline
> ---
>  .gitlab-ci.yml  |  2 ++
>  automation/gitlab-ci/build.yaml |  6 ++
>  automation/gitlab-ci/test.yaml  | 14 ++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 5a9b8b722838..b3beb2ff9ddf 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -1,5 +1,7 @@
>  variables:
>XEN_REGISTRY: registry.gitlab.com/xen-project/xen
> +  SELECTED_JOBS_ONLY:
> +description: "Regex to select only some jobs, must be enclosed with /. 
> For example /job1|job2/"
>  
>  workflow:
>rules:
> diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
> index 35e224366f62..f12de00a164a 100644
> --- a/automation/gitlab-ci/build.yaml
> +++ b/automation/gitlab-ci/build.yaml
> @@ -12,6 +12,12 @@
>- '*/*.log'
>  when: always
>needs: []
> +  rules:
> +  - if: $SELECTED_JOBS_ONLY && $CI_JOB_NAME =~ $SELECTED_JOBS_ONLY
> +when: always
> +  - if: $SELECTED_JOBS_ONLY
> +when: never
> +  - when: on_success
>  
>  .gcc-tmpl:
>variables: &gcc
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index c21a37933881..93632f1f9204 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -1,6 +1,11 @@
>  .test-jobs-common:
>stage: test
>image: ${XEN_REGISTRY}/${CONTAINER}
> +  rules:
> +  - if: $SELECTED_JOBS_ONLY && $CI_JOB_NAME =~ $SELECTED_JOBS_ONLY
> +  - if: $SELECTED_JOBS_ONLY
> +when: never
> +  - when: on_success
>  
>  .arm64-test-needs: &arm64-test-needs
>- alpine-3.18-arm64-rootfs-export
> @@ -99,6 +104,9 @@
>- '*.dtb'
>  when: always
>rules:
> +- if: $SELECTED_JOBS_ONLY && $CI_JOB_NAME =~ $SELECTED_JOBS_ONLY
> +- if: $SELECTED_JOBS_ONLY
> +  when: never
>  - if: $XILINX_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
>tags:
>  - xilinx
> @@ -117,6 +125,9 @@
>- '*.log'
>  when: always
>rules:
> +- if: $SELECTED_JOBS_ONLY && $CI_JOB_NAME =~ $SELECTED_JOBS_ONLY
> +- if: $SELECTED_JOBS_ONLY
> +  when: never
>  - if: $XILINX_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
>tags:
>  - xilinx
> @@ -137,6 +148,9 @@
>- '*.log'
>  when: always
>rules:
> +- if: $SELECTED_JOBS_ONLY && $CI_JOB_NAME =~ $SELECTED_JOBS_ONLY
> +- if: $SELECTED_JOBS_ONLY
> +  when: never
>  - if: $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
>tags:
>  - qubes-hw2
> -- 
> git-series 0.9.1
> 

Re: xen/x86: resolve the last 3 MISRA R16.6 violations

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Jan Beulich wrote:
> On 18.02.2025 22:42, Stefano Stabellini wrote:
> > On Tue, 18 Feb 2025, Jan Beulich wrote:
> >> On 18.02.2025 00:12, Stefano Stabellini wrote:
> >>> On Mon, 17 Feb 2025, Jan Beulich wrote:
>  On 15.02.2025 03:16, Stefano Stabellini wrote:
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -3797,22 +3797,14 @@ uint64_t hvm_get_reg(struct vcpu *v, unsigned 
> > int reg)
> >  {
> >  ASSERT(v == current || !vcpu_runnable(v));
> >  
> > -switch ( reg )
> > -{
> > -default:
> > -return alternative_call(hvm_funcs.get_reg, v, reg);
> > -}
> > +return alternative_call(hvm_funcs.get_reg, v, reg);
> >  }
> >  
> >  void hvm_set_reg(struct vcpu *v, unsigned int reg, uint64_t val)
> >  {
> >  ASSERT(v == current || !vcpu_runnable(v));
> >  
> > -switch ( reg )
> > -{
> > -default:
> > -return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
> > -}
> > +return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
> >  }
> 
>  Both of these were, iirc, deliberately written using switch(), to ease
>  possible future changes.
> >>>
> >>> To be honest, I do not see any value in the way they are currently
> >>> written. However, if you prefer, I can add a deviation for this, with
> >>> one SAF comment for each of these two. The reason for the deviation
> >>> would be "deliberate to ease possible future change". Please let me know
> >>> how you would like to proceed.
> >>
> >> Well, best next thing you can do is seek input from the person who has
> >> written that code, i.e. Andrew.
> > 
> > Andrew wrote in chat that he is OK with a deviation and he can live with
> > a SAF deviation. Here is the patch.
> > 
> > 
> > ---
> > xen/x86: resolve the last 3 MISRA R16.6 violations
> > 
> > MISRA R16.6 states that "Every switch statement shall have at least two
> > switch-clauses". There are only 3 violations left on x86 (zero on ARM).
> > 
> > One of them is only a violation depending on the kconfig configuration.
> > So deviate it instead with a SAF comment.
> > 
> > Two of them are deliberate to enable future additions. Deviate them as
> > such.
> > 
> > Signed-off-by: Stefano Stabellini 
> 
> Acked-by: Jan Beulich 

Thanks!

Oleksii, may I ask for a release-ack?



Re: [PATCH v3] xen/dom0less: support for vcpu affinity

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Julien Grall wrote:
> Hi Stefano,
> 
> On 18/02/2025 20:29, Stefano Stabellini wrote:
> > Add vcpu affinity to the dom0less bindings. Example:
> > 
> >  dom1 {
> >  ...
> >  cpus = <4>;
> >  vcpu0 {
> > compatible = "xen,vcpu-affinity";
> 
> I would prefer if the compatible is "xen,vcpu". This would allow us to extend
> for anything that vCPU specific. I would also look less odd if someone ...
> 
> > id = <0>;
> > hard-affinity = "4-7";
> 
> ... doesn't specify hard-affinity which is optional.
> 
> >  };
> >  vcpu1 {
> > compatible = "xen,vcpu-affinity";
> > id = <1>;
> > hard-affinity = "0-3";
> 
> NIT: This example is exactly the same as vcpu0. How about changing to a list
> of range/single value? This would make clear that a mix is possible.
> 
> >  };
> >  vcpu2 {
> > compatible = "xen,vcpu-affinity";
> > id = <2>;
> > hard-affinity = "1,6";
> >  };
> >  ...
> > 
> > Note that the property hard-affinity is optional. It is possible to add
> > other properties in the future not only to specify soft affinity, but
> > also to provide more precise methods for configuring affinity. For
> > instance, on ARM the MPIDR could be use to specify the pCPU. For now, it
> > is left to the future.
> > 
> > Signed-off-by: Xenia Ragiadakou 
> > Signed-off-by: Stefano Stabellini 
> > ---
> > Changes in v3:
> > - improve commit message
> > - improve binding doc
> > - add panic on invalid pCPU
> > - move parsing to a separate function
> > 
> > diff --git a/docs/misc/arm/device-tree/booting.txt
> > b/docs/misc/arm/device-tree/booting.txt
> > index 9c881baccc..10e55c825c 100644
> > --- a/docs/misc/arm/device-tree/booting.txt
> > +++ b/docs/misc/arm/device-tree/booting.txt
> > @@ -324,6 +324,27 @@ The ramdisk sub-node has the following properties:
> >   property because it will be created by the UEFI stub on boot.
> >   This option is needed only when UEFI boot is used.
> >   +Under the "xen,domain" compatible node, it is possible optionally to add
> > +one or more sub-nodes to configure vCPU affinity. The vCPU affinity
> > +sub-node has the following properties:
> > +
> > +- compatible
> > +
> > +"xen,vcpu-affinity"
> > +
> > +- id
> > +
> > +A 32-bit integer that specifies the vCPU id. 0 is the first vCPU.
> > +The last vCPU is cpus-1, where "cpus" is the number of vCPUs
> > +specified with the "cpus" property under the "xen,domain" node.
> 
> I think it would be worth mentioning that each node must have a unique ID. It
> is not necessary to check in the code, but it would avoid the question of what
> happen if someone specify twice the VCPU with different affinity.
> 
> > +
> > +- hard-affinity
> > +
> > +Optional. A string specifying the hard affinity configuration for the
> > +vCPU: a comma-separated list of pCPUs or ranges of pCPUs is used.
> > +Ranges are hyphen-separated intervals (such as `0-4`) and are inclusive
> > +on both sides. The numbers refer to pCPU ids.
> 
> Technically MPIDRs are pCPUs ID. So I would add "logical" in front of pCPU ids
> to make clear what IDs are we talking about
> 
> > +
> > Example
> >   ===
> 
> No update to the example?
> 
> > diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
> > index 49d1f14d65..e364820189 100644
> > --- a/xen/arch/arm/dom0less-build.c
> > +++ b/xen/arch/arm/dom0less-build.c
> > @@ -810,6 +810,68 @@ static int __init construct_domU(struct domain *d,
> >   return rc;
> >   }
> >   +static void __init domain_vcpu_affinity(struct domain *d,
> > +const struct dt_device_node *node)
> > +{> +const char *hard_affinity_str = NULL;
> > +struct dt_device_node *np;
> > +uint32_t val;
> > +int rc;
> 
> Can you expain why 'rc', 'val', 'hard_affinity_str' are defined outside of the
> loop when ...
> 
> > +
> > +dt_for_each_child_node(node, np)
> > +{
> > +const char *s;
> > +struct vcpu *v;
> > +cpumask_t affinity;
> 
> ... they are not? Yet they have the same property (i.e. only called within the
> loop).
> 
> > +
> > +if ( !dt_device_is_compatible(np, "xen,vcpu-affinity") )
> > +continue;
> > +
> > +if ( !dt_property_read_u32(np, "id", &val) )
> 
> Looking at the binding you wrote, "id" is mandatory. So I think we should
> throw an error if it is not present.
> 
> > +continue;
> > +> +if ( val >= d->max_vcpus )
> > +panic("Invalid vcpu_id %u for domain %s\n", val,
> > dt_node_name(node));
> 
> NIT: Maybe print the maximum number of vCPUs? This would make easier to know
> what's wrong with the ID.
> 
> > +
> > +v = d->vcpu[val];
> > +rc

Re: [PATCH] x86/MCE-telem: adjust cookie definition

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Andrew Cooper wrote:
> On 19/02/2025 10:00 am, Jan Beulich wrote:
> > struct mctelem_ent is opaque outside of mcetelem.c; the cookie
> > abstraction exists - afaict - just to achieve this opaqueness. Then it
> > is irrelevant though which kind of pointer mctelem_cookie_t resolves to.
> > IOW we can as well use struct mctelem_ent there, allowing to remove the
> > casts from COOKIE2MCTE() and MCTE2COOKIE(). Their removal addresses
> > Misra C:2012 rule 11.2 ("Conversions shall not be performed between a
> > pointer to an incomplete type and any other type") violations.
> >
> > No functional change intended.
> >
> > Signed-off-by: Jan Beulich 
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/9181587757
> 
> Eclair does appear to be happy with this approach (assuming I stripped
> down to only checking R11.2 correctly, and making it fatal).
> 
> For the change itself, it's an almost identical binary, differing only
> in the string section which I expect means some embedded line numbers.
> 
> Reviewed-by: Andrew Cooper 

Thank you very much Jan for writing the patch, and thank you Andrew for
running the pipeline. It is great that resolves all the 11.2 issues!

Oleksii, may I ask for a release-ack? I'll follow up with a patch to
mark 11.2 as clean.



Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 09:08, Joel Fernandes wrote:
>> Pretty much so yeah. That is, *if* there such a vmalloc'd address access in
>> early entry code - testing says it's not the case, but I haven't found a
>> way to instrumentally verify this.
> Ok, thanks for confirming. Maybe there is an address sanitizer way of 
> verifying,
> but yeah it is subtle and there could be more than one way of solving it. Too
> much 'fun' 😉
For debugging, you could just make a copy of part or all of the page
tables and run the NOHZ_FULL tasks from those while they're in
userspace. Then, instead of flushing the TLB in the deferred work, you
switch over to the "real" page tables.

That would _behave_ like a CPU with a big TLB and really old, crusty TLB
entries from the last time the kernel ran.

BTW, the other option for all of this is just to say that if you want
IPI-free TLB flushing that you need to go buy some hardware with it as
opposed to all of this complexity.



Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Oleksandr Andrushchenko wrote:
> Yes, I do agree. But only if we talk about having an automated
> code style check now (which is definitely the goal at some time).
> Before that we could still use the tool to take all that good that
> it does and manually prepare a set of patches to fix those
> code style issues which we like.

Let's explore this option a bit more. Let's say that we write down our
coding style in plain English by enhancing CODING_STYLE. This newer
CODING_STYLE has also a corresponding clang-format configuration.

Now, we cannot use clang-format to reformat everything because, as we
are discovering with this example, clang-format is overzealous and
changes too many things. Instead, we take "inspiration" from
clang-format's output but we manually prepare a patch series to apply
code style changes to Xen as the coding style today is not uniform
across the code base and also not always conforming to CODING_STYLE.

At this point, we have already made an improvement to CODING_STYLEe, and
made the Xen coding style more uniform across the codebase.

But do we also have an automated coding style checker that we can use?
Can we use clang-format to check new patches coming in? Or would
clang-format flag too many things as coding style errors?

If it is flagging too many things as error, so we cannot use it as
automated checker, is it still worth going through the exercise? Yes, we
make some improvement we haven't reached the goal of having an automated
code style checker.



Re: [PATCH v2 1/2] xen/list: avoid UB in list iterators

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Andrew Cooper wrote:
> On 19/02/2025 2:18 pm, Juergen Gross wrote:
> > The list_for_each_entry*() iterators are testing for having reached the
> > end of the list in a way which relies on undefined behavior: the
> > iterator (being a pointer to the struct of a list element) is advanced
> > and only then tested to have reached not the next element, but the list
> > head. This results in the list head being addressed via a list element
> > pointer, which is undefined, in case the list elements have a higher
> > alignment than the list head.
> >
> > Avoid that by testing for the end of the list before advancing the
> > iterator. In case of having reached the end of the list, set the
> > iterator to NULL and use that for stopping the loop. This has the
> > additional advantage of not leaking the iterator pointing to something
> > which isn't a list element past the loop.
> >
> > There is one case in the Xen code where the iterator is used after
> > the loop: it is tested to be non-NULL to indicate the loop has run
> > until reaching the end of the list. This case is modified to use the
> > iterator being NULL for indicating the end of the list has been
> > reached.
> >
> > Reported-by: Andrew Cooper 
> > Signed-off-by: Juergen Gross 
> > Release-Acked-by: Oleksii Kurochko 
> 
> I agree there's an issue here, but as said before, I do not agree with
> this patch.
> 
> For starters, bloat-o-meter on a random top-of-tree build says
> 
>     add/remove: 8/1 grow/shrink: 112/68 up/down: 4314/-2855 (1459)
> 
> which is a horrible overhead for a case where the sequence of
> instructions are correct (only the C level types are a problem) and ...
> 
> > ---
> > No proper Fixes: tag, as this bug predates Xen's git and mercurial
> > history.
> > V2:
> > - fix one use case (Jan Beulich)
> > - let list_is_first() return bool, rename parameter (Jan Beulich)
> > - paranthesize iterator when used as non-NULL test (Jan Beulich)
> > - avoid dereferencing NULL in the safe iterators (Jan Beulich)
> > ---
> >  xen/drivers/passthrough/x86/hvm.c |   3 +-
> 
> ... the need for this adjustment being discovered after-the-fact means
> it's a very risky change at this juncture in the release.

I have not reviewed the patch in enough detail to form an opinion on the
approach yet. However, I want to note that I also don't think that this
series should be committed at this stage of the release process. It
would be better to wait for the 4.21 release cycle.

Re: [BUG?] Wrong RC reported during 'make install'

2025-02-19 Thread Jan Beulich
On 19.02.2025 19:04, Andrew Cooper wrote:
> Oleksii has asked for RC5, and we're overdue.  I'm intending to commit:
> 
> diff --git a/xen/Makefile b/xen/Makefile
> index 65b460e2b480..4e37fff92514 100644
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
>  # All other places this is stored (eg. compile.h) should be autogenerated.
>  export XEN_VERSION   = 4
>  export XEN_SUBVERSION    = 20
> -export XEN_EXTRAVERSION ?= -rc$(XEN_VENDORVERSION)
> +export XEN_EXTRAVERSION ?= .0-rc5$(XEN_VENDORVERSION)
>  export XEN_FULLVERSION   =
> $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
>  -include xen-version
>  
> in order to make that happen properly, and finally have the tarball be a
> straight `git archive` invocation.
> 
> Does this sound acceptable?

Yes. It's not optimal that the file then needs touching for every RC, but
then again it's also no different from what we do on every stable release.

Jan



Re: [BUG?] Wrong RC reported during 'make install'

2025-02-19 Thread Stefano Stabellini
On Wed, 19 Feb 2025, Andrew Cooper wrote:
> On 13/02/2025 7:54 am, Jan Beulich wrote:
> > On 13.02.2025 01:51, Andrew Cooper wrote:
> >> On 12/02/2025 9:52 pm, Stefano Stabellini wrote:
> >>> On Wed, 12 Feb 2025, Oleksii Kurochko wrote:
>  Hello everyone,
> 
>  During the installation of Xen on an ARM server machine from the source 
>  code,
>  I found that the wrong release candidate (rc) is being used:
>    $ make install  
>  install -m0644 -p xen //boot/xen-4.20-rc  
>  install: cannot remove ‘//boot/xen-4.20-rc’: Permission denied  
>  make[1]: *** [Makefile:507: _install] Error 1
>  My expectation is that it should be xen-4.20-rc4.
> 
>  I'm not sure if this behavior is intentional or if users are expected to 
>  set
>  the XEN_VENDORVERSION variable manually to ensure the correct release
>  candidate number.
> 
>  In my opinion, we should set the proper release candidate number after
>  "xen-4.20-rc" automatically.
> 
>  Does anyone have any thoughts or suggestions on how to resolve this 
>  issue?
> >>> Hi Oleksii,
> >>>
> >>> I did a quick test and I see exactly the same on x86 as well. This patch
> >>> fixes it, but then it would need someone to update the RC number in
> >>> xen/Makefile every time a new RC is made.
> >>>
> >>> ---
> >>> xen: add RC version number to xen filename
> >>>
> >>> Signed-off-by: Stefano Stabellini 
> >> This is a direct consequence of the request to keep XEN_EXTRAVERSION at
> >> "-rc" throughout the release cycle.
> >>
> >> I'm having to manually edit that simply to create the tarballs
> >> correctly, which in turn means that the tarball isn't a byte-for-byte
> >> identical `git archive` of the tag it purports to be.
> > Just for my understanding - may I ask why this editing is necessary?
> > Other release technicians never mentioned the (indeed undesirable)
> > need to do so.
> 
> I did point it out.  I also needed to get RC1 cut and everyone had left
> the office.
> 
> xen.git$ make src-tarball-release && tar tf dist/xen-4.20-rc.tar.gz | head
> 
> Source tarball in /home/andrew/xen.git/dist/xen-4.20-rc.tar.gz
> xen-4.20-rc/
> xen-4.20-rc/.github/
> xen-4.20-rc/.github/workflows/
> xen-4.20-rc/.github/workflows/coverity.yml
> xen-4.20-rc/.gitarchive-info
> xen-4.20-rc/Makefile
> xen-4.20-rc/stubdom/
> xen-4.20-rc/stubdom/Makefile
> xen-4.20-rc/stubdom/grub/
> xen-4.20-rc/stubdom/grub/Makefile
> 
> mktarball uses `$(MAKE) -C xen xenversion` which uses XEN_EXTRAVERSION.
> 
> XEN_EXTRAVERSION needs both the .0 and the RC number in order to make
> the tarball with the correct name and correct top directory.
> 
> What I didn't anticipate was that, while editing XEN_EXTRAVERSION
> locally gets a proper tarball, the contents within the tarball are
> nonspecific as to the RC, hence Oleksii's observation.
> 
> It also means the tarball wasn't a straight `git archive` of the tree,
> which is one of the reasons behind taking out the sub-repos.
> >> I'd not twigged that it mean the builds from the tarballs reported false
> >> information too.
> >>
> >> While I appreciate the wish to not have a commit per RC bumping
> >> XEN_EXTRAVERSION, I think the avoidance of doing so is creating more
> >> problems than it solves, and we should revert back to the prior way of
> >> doing things.
> > Sure, if it truly is getting in the way, then it needs re-considering.
> > Just to mention it: Then the question is going to be though whether
> > really to merely adjust XEN_EXTRAVERSION, or whether instead to do
> > this consistently in all (three?) places.
> 
> It's only XEN_EXTRAVERSION which needs to change (I think).
> 
> I think README and SUPPORT.md are fine to say as they are, for
> generically -rc.
> 
> 
> Oleksii has asked for RC5, and we're overdue.  I'm intending to commit:
> 
> diff --git a/xen/Makefile b/xen/Makefile
> index 65b460e2b480..4e37fff92514 100644
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
>  # All other places this is stored (eg. compile.h) should be autogenerated.
>  export XEN_VERSION   = 4
>  export XEN_SUBVERSION    = 20
> -export XEN_EXTRAVERSION ?= -rc$(XEN_VENDORVERSION)
> +export XEN_EXTRAVERSION ?= .0-rc5$(XEN_VENDORVERSION)
>  export XEN_FULLVERSION   =
> $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
>  -include xen-version
>  
> in order to make that happen properly, and finally have the tarball be a
> straight `git archive` invocation.
> 
> Does this sound acceptable?

Yes, looks fine. Please go ahead.

Re: [PATCH v6] Avoid crash calling PrintErrMesg from efi_multiboot2

2025-02-19 Thread Jan Beulich
On 19.02.2025 17:34, Frediano Ziglio wrote:
> On Mon, Feb 17, 2025 at 4:56 PM Jan Beulich  wrote:
>> On 17.02.2025 17:52, Frediano Ziglio wrote:
>>> On Mon, Feb 17, 2025 at 4:41 PM Andrew Cooper  
>>> wrote:
 On 17/02/2025 4:31 pm, Jan Beulich wrote:
> On 17.02.2025 17:26, Frediano Ziglio wrote:
>> --- a/xen/common/efi/efi-common.mk
>> +++ b/xen/common/efi/efi-common.mk
>> @@ -2,6 +2,7 @@ EFIOBJ-y := boot.init.o pe.init.o ebmalloc.o runtime.o
>>  EFIOBJ-$(CONFIG_COMPAT) += compat.o
>>
>>  CFLAGS-y += -fshort-wchar
>> +CFLAGS-y += -fno-jump-tables
>>  CFLAGS-y += -iquote $(srctree)/common/efi
>>  CFLAGS-y += -iquote $(srcdir)
> Do source files other than boot.c really need this? Do any other files 
> outside
> of efi/ maybe also need this (iirc this point was made along with the v5 
> comment
> you got)?

 Every TU reachable from efi_multiboot2() needs this, because all of
 those have logic executed at an incorrect virtual address.
>>>
>>> I assume all the files in efi directory will deal with EFI boot so
>>> it's good to have the option enabled for the files inside the
>>> directory. The only exception seems runtime.c file.
>>
>> And compat.c, being a wrapper around runtime.c.
>>
>>> About external dependencies not sure how to detect them (beside
>>> looking at all symbols).
>>
>> Which raises the question whether we don't need to turn on that option
>> globally, without (as we have it now) any condition.
> 
> I would avoid adding a potential option that could affect performances
> so badly at the moment.
> These changes are pretty contained.
> I would merge this patch and check any external dependencies as a follow up.

Well. It's a judgement call to the maintainers. If I were them, I'd demand
that Andrew's remark be addressed, one way or another.

Jan



RE: [PATCH v2 02/11] xen/x86: introduce new sub-hypercall to propagate CPPC data

2025-02-19 Thread Penny, Zheng
[AMD Official Use Only - AMD Internal Distribution Only]

Hi,

> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, February 18, 2025 10:49 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Andryuk, Jason
> ; Andrew Cooper ;
> Roger Pau Monné ; Anthony PERARD
> ; Orzel, Michal ; Julien
> Grall ; Stefano Stabellini ; xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH v2 02/11] xen/x86: introduce new sub-hypercall to 
> propagate
> CPPC data
>
> On 18.02.2025 07:05, Penny, Zheng wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Monday, February 17, 2025 3:39 PM
> >>
> >> On 17.02.2025 08:20, Penny, Zheng wrote:
> >>> [AMD Official Use Only - AMD Internal Distribution Only]
> >>
> >> Btw, boiler plates like this aren't really liked on public mailing
> >> lists, for being contrary to the purpose of such lists.
>
> You did read this, didn't you? I ask because the same boilerplate keeps 
> appearing in
> your mails.
>
>  -Original Message-
>  From: Jan Beulich 
>  Sent: Tuesday, February 11, 2025 7:10 PM
> 
>  On 06.02.2025 09:32, Penny Zheng wrote:
> > +{
> > +int ret = 0, cpuid;
> > +struct processor_pminfo *pm_info;
> > +
> > +cpuid = get_cpu_id(acpi_id);
> > +if ( cpuid < 0 || !cppc_data )
> > +{
> > +ret = -EINVAL;
> > +goto out;
> > +}
> > +if ( cpufreq_verbose )
> > +printk("Set CPU acpi_id(%d) cpuid(%d) CPPC State info:\n",
> > +   acpi_id, cpuid);
> > +
> > +pm_info = processor_pminfo[cpuid];
> > +if ( !pm_info )
> > +{
> > +pm_info = xvzalloc(struct processor_pminfo);
> > +if ( !pm_info )
> > +{
> > +ret = -ENOMEM;
> > +goto out;
> > +}
> > +processor_pminfo[cpuid] = pm_info;
> > +}
> > +pm_info->acpi_id = acpi_id;
> > +pm_info->id = cpuid;
> > +pm_info->cppc_data = *cppc_data;
> > +
> > +if ( cpufreq_verbose )
> > +print_CPPC(&pm_info->cppc_data);
> > +
> > + out:
> > +return ret;
> > +}
> 
>  What's the interaction between the data set by set_px_pminfo() and
>  the data set here? In particular, what's going to happen if both
>  functions come into play for the same CPU? Shouldn't there be some
>  sanity
> >> checks?
> >>>
> >>> Yes, I've considered this and checked ACPI spec. I'll refer them here:
> >>> ```
> >>> If the platform supports CPPC, the _CPC object must exist under all
> >>> processor
> >> objects.
> >>> That is, OSPM is not expected to support mixed mode (CPPC & legacy
> >>> PSS,
> >> _PCT, _PPC) operation.
> >>> ```
> >>> See
> >>> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Contr
> >>> ol
> >>> .html?highlight=cppc#power-performance-and-throttling-state-dependen
> >>> ci es So CPUs could have both _CPC and legacy P-state info in ACPI
> >>> for each entry, they just can't have mixed-mode Maybe we shall add
> >>> sanity check to see if _CPC exists, it shall exist for all pcpus?
> >>
> >> Maybe, but that wasn't the point of my remark.
> >>
> >> Properly behaving Dom0 should probably be passing only one of the two
> >> possible pieces of information. Yet maybe we'd better sanity check _that_?
> >> (I don't recall seeing Linux kernel side patches yet; if they were
> >> posted somewhere, they may at least partly address my concern.)
> >>
> >
> > In my linux patch,
> > https://lore.kernel.org/lkml/20241204082430.469092-1-Penny.Zheng@amd.c
> > om/T/ I only did zero-value check in xen_processor_get_perf_caps(), Do
> > you think in that place, I shall add more strict sanity check, like
> > the register value shall not be zero and also must smaller than UINT8_T?
> > Or we just do the above check in Xen part when receiving the data?
>
> Value range checking is nice to have in Dom0, but the same checking needs 
> doing
> in the hypervisor anyway. But that isn't what my comment was about. What I'm
> asking is how it is being made sure that we won't have to deal with a mix of
> traditional and CPPC data in the hypervisor.
>

Are you suggesting that we only do either set_cppc_pminfo or set_px_pminfo?
Only one side data get set to avoid the consequence of mixture.

> Jan


Re: [PATCH v3] xen/dom0less: support for vcpu affinity

2025-02-19 Thread Oleksii Kurochko


On 2/18/25 9:29 PM, Stefano Stabellini wrote:

Add vcpu affinity to the dom0less bindings. Example:

 dom1 {
 ...
 cpus = <4>;
 vcpu0 {
compatible = "xen,vcpu-affinity";
id = <0>;
hard-affinity = "4-7";
 };
 vcpu1 {
compatible = "xen,vcpu-affinity";
id = <1>;
hard-affinity = "0-3";
 };
 vcpu2 {
compatible = "xen,vcpu-affinity";
id = <2>;
hard-affinity = "1,6";
 };
 ...

Note that the property hard-affinity is optional. It is possible to add
other properties in the future not only to specify soft affinity, but
also to provide more precise methods for configuring affinity. For
instance, on ARM the MPIDR could be use to specify the pCPU. For now, it
is left to the future.


I think we want this to add to CHANGELOG.

Thanks.

~ Oleksii




Signed-off-by: Xenia Ragiadakou
Signed-off-by: Stefano Stabellini
---
Changes in v3:
- improve commit message
- improve binding doc
- add panic on invalid pCPU
- move parsing to a separate function

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 9c881baccc..10e55c825c 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -324,6 +324,27 @@ The ramdisk sub-node has the following properties:
  property because it will be created by the UEFI stub on boot.
  This option is needed only when UEFI boot is used.
  
+Under the "xen,domain" compatible node, it is possible optionally to add

+one or more sub-nodes to configure vCPU affinity. The vCPU affinity
+sub-node has the following properties:
+
+- compatible
+
+"xen,vcpu-affinity"
+
+- id
+
+A 32-bit integer that specifies the vCPU id. 0 is the first vCPU.
+The last vCPU is cpus-1, where "cpus" is the number of vCPUs
+specified with the "cpus" property under the "xen,domain" node.
+
+- hard-affinity
+
+Optional. A string specifying the hard affinity configuration for the
+vCPU: a comma-separated list of pCPUs or ranges of pCPUs is used.
+Ranges are hyphen-separated intervals (such as `0-4`) and are inclusive
+on both sides. The numbers refer to pCPU ids.
+
  
  Example

  ===
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 49d1f14d65..e364820189 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -810,6 +810,68 @@ static int __init construct_domU(struct domain *d,
  return rc;
  }
  
+static void __init domain_vcpu_affinity(struct domain *d,

+const struct dt_device_node *node)
+{
+const char *hard_affinity_str = NULL;
+struct dt_device_node *np;
+uint32_t val;
+int rc;
+
+dt_for_each_child_node(node, np)
+{
+const char *s;
+struct vcpu *v;
+cpumask_t affinity;
+
+if ( !dt_device_is_compatible(np, "xen,vcpu-affinity") )
+continue;
+
+if ( !dt_property_read_u32(np, "id", &val) )
+continue;
+
+if ( val >= d->max_vcpus )
+panic("Invalid vcpu_id %u for domain %s\n", val, 
dt_node_name(node));
+
+v = d->vcpu[val];
+rc = dt_property_read_string(np, "hard-affinity", &hard_affinity_str);
+if ( rc < 0 )
+continue;
+
+s = hard_affinity_str;
+cpumask_clear(&affinity);
+while ( *s != '\0' )
+{
+unsigned int start, end;
+
+start = simple_strtoul(s, &s, 0);
+
+if ( *s == '-' )/* Range */
+{
+s++;
+end = simple_strtoul(s, &s, 0);
+}
+else/* Single value */
+end = start;
+
+if ( end >= nr_cpu_ids )
+panic("Invalid pCPU %u for domain %s\n", end, 
dt_node_name(node));
+
+for ( ; start <= end; start++ )
+cpumask_set_cpu(start, &affinity);
+
+if ( *s == ',' )
+s++;
+else if ( *s != '\0' )
+break;
+}
+
+rc = vcpu_set_hard_affinity(v, &affinity);
+if ( rc )
+panic("vcpu%d: failed to set hard affinity\n", v->vcpu_id);
+}
+}
+
  void __init create_domUs(void)
  {
  struct dt_device_node *node;
@@ -992,6 +1054,8 @@ void __init create_domUs(void)
  if ( rc )
  panic("Could not set up domain %s (rc = %d)\n",
dt_node_name(node), rc);
+
+domain_vcpu_affinity(d, node);
  }
  }
  


Re: [PATCH v2 02/11] xen/x86: introduce new sub-hypercall to propagate CPPC data

2025-02-19 Thread Jan Beulich
On 19.02.2025 09:36, Penny, Zheng wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Tuesday, February 18, 2025 10:49 PM
>>
>> On 18.02.2025 07:05, Penny, Zheng wrote:
 -Original Message-
 From: Jan Beulich 
 Sent: Monday, February 17, 2025 3:39 PM

 On 17.02.2025 08:20, Penny, Zheng wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]

 Btw, boiler plates like this aren't really liked on public mailing
 lists, for being contrary to the purpose of such lists.
>>
>> You did read this, didn't you? I ask because the same boilerplate keeps 
>> appearing in
>> your mails.
>>
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Tuesday, February 11, 2025 7:10 PM
>>
>> On 06.02.2025 09:32, Penny Zheng wrote:
>>> +{
>>> +int ret = 0, cpuid;
>>> +struct processor_pminfo *pm_info;
>>> +
>>> +cpuid = get_cpu_id(acpi_id);
>>> +if ( cpuid < 0 || !cppc_data )
>>> +{
>>> +ret = -EINVAL;
>>> +goto out;
>>> +}
>>> +if ( cpufreq_verbose )
>>> +printk("Set CPU acpi_id(%d) cpuid(%d) CPPC State info:\n",
>>> +   acpi_id, cpuid);
>>> +
>>> +pm_info = processor_pminfo[cpuid];
>>> +if ( !pm_info )
>>> +{
>>> +pm_info = xvzalloc(struct processor_pminfo);
>>> +if ( !pm_info )
>>> +{
>>> +ret = -ENOMEM;
>>> +goto out;
>>> +}
>>> +processor_pminfo[cpuid] = pm_info;
>>> +}
>>> +pm_info->acpi_id = acpi_id;
>>> +pm_info->id = cpuid;
>>> +pm_info->cppc_data = *cppc_data;
>>> +
>>> +if ( cpufreq_verbose )
>>> +print_CPPC(&pm_info->cppc_data);
>>> +
>>> + out:
>>> +return ret;
>>> +}
>>
>> What's the interaction between the data set by set_px_pminfo() and
>> the data set here? In particular, what's going to happen if both
>> functions come into play for the same CPU? Shouldn't there be some
>> sanity
 checks?
>
> Yes, I've considered this and checked ACPI spec. I'll refer them here:
> ```
> If the platform supports CPPC, the _CPC object must exist under all
> processor
 objects.
> That is, OSPM is not expected to support mixed mode (CPPC & legacy
> PSS,
 _PCT, _PPC) operation.
> ```
> See
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Contr
> ol
> .html?highlight=cppc#power-performance-and-throttling-state-dependen
> ci es So CPUs could have both _CPC and legacy P-state info in ACPI
> for each entry, they just can't have mixed-mode Maybe we shall add
> sanity check to see if _CPC exists, it shall exist for all pcpus?

 Maybe, but that wasn't the point of my remark.

 Properly behaving Dom0 should probably be passing only one of the two
 possible pieces of information. Yet maybe we'd better sanity check _that_?
 (I don't recall seeing Linux kernel side patches yet; if they were
 posted somewhere, they may at least partly address my concern.)

>>>
>>> In my linux patch,
>>> https://lore.kernel.org/lkml/20241204082430.469092-1-Penny.Zheng@amd.c
>>> om/T/ I only did zero-value check in xen_processor_get_perf_caps(), Do
>>> you think in that place, I shall add more strict sanity check, like
>>> the register value shall not be zero and also must smaller than UINT8_T?
>>> Or we just do the above check in Xen part when receiving the data?
>>
>> Value range checking is nice to have in Dom0, but the same checking needs 
>> doing
>> in the hypervisor anyway. But that isn't what my comment was about. What I'm
>> asking is how it is being made sure that we won't have to deal with a mix of
>> traditional and CPPC data in the hypervisor.
> 
> Are you suggesting that we only do either set_cppc_pminfo or set_px_pminfo?
> Only one side data get set to avoid the consequence of mixture.

That's one of the possible ways to avoid mixing things. Then again Dom0
won't know what cpufreq= was passed to Xen, and hence it may not be in
the position to decide what to upload. It hence may need to be Xen to
simply ignore the uploading of data it's not going to use.

Jan



[PATCH v3 0/3] xen: Fix usage of devices behind a VMD bridge

2025-02-19 Thread Roger Pau Monne
Hello,

The following series should fix the usage of devices behind a VMD bridge
when running Linux as a Xen PV hardware domain (dom0).  I've only been
able to test PV. I think PVH should also work but I don't have hardware
capable of testing it right now.

I don't expect the first two patches to be problematic, the last patch
is likely to be more controversial.  I've tested it internally and
didn't see any issues, but my testing of PV mode is mostly limited to
dom0.

Thanks, Roger.

Roger Pau Monne (3):
  xen/pci: Do not register devices with segments >= 0x1
  PCI: vmd: Disable MSI remapping bypass under Xen
  PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag

 arch/x86/pci/xen.c   |  8 ++--
 drivers/pci/controller/vmd.c | 20 +++
 drivers/pci/msi/msi.c| 37 
 drivers/xen/pci.c| 32 +++
 include/linux/msi.h  |  3 ++-
 kernel/irq/msi.c |  2 +-
 6 files changed, 78 insertions(+), 24 deletions(-)

-- 
2.46.0




[PATCH v3 1/3] xen/pci: Do not register devices with segments >= 0x10000

2025-02-19 Thread Roger Pau Monne
The current hypercall interface for doing PCI device operations always uses
a segment field that has a 16 bit width.  However on Linux there are buses
like VMD that hook up devices into the PCI hierarchy at segment >= 0x1,
after the maximum possible segment enumerated in ACPI.

Attempting to register or manage those devices with Xen would result in
errors at best, or overlaps with existing devices living on the truncated
equivalent segment values.  Note also that the VMD segment numbers are
arbitrarily assigned by the OS, and hence there would need to be some
negotiation between Xen and the OS to agree on how to enumerate VMD
segments and devices behind them.

Skip notifying Xen about those devices.  Given how VMD bridges can
multiplex interrupts on behalf of devices behind them there's no need for
Xen to be aware of such devices for them to be usable by Linux.

Signed-off-by: Roger Pau Monné 
Acked-by: Juergen Gross 
---
Changes since v2:
 - Capitalize subject.
 - Add extra comments to note thet 16bit segments value hypercall interface
   limitation.

Changes since v1:
 - Adjust commit message width to 75 columns.
 - Expand commit message.
---
 drivers/xen/pci.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 416f231809cb..bfe07adb3e3a 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -43,6 +43,18 @@ static int xen_add_device(struct device *dev)
pci_mcfg_reserved = true;
}
 #endif
+
+   if (pci_domain_nr(pci_dev->bus) >> 16) {
+   /*
+* The hypercall interface is limited to 16bit PCI segment
+* values, do not attempt to register devices with Xen in
+* segments greater or equal than 0x1.
+*/
+   dev_info(dev,
+"not registering with Xen: invalid PCI segment\n");
+   return 0;
+   }
+
if (pci_seg_supported) {
DEFINE_RAW_FLEX(struct physdev_pci_device_add, add, optarr, 1);
 
@@ -149,6 +161,16 @@ static int xen_remove_device(struct device *dev)
int r;
struct pci_dev *pci_dev = to_pci_dev(dev);
 
+   if (pci_domain_nr(pci_dev->bus) >> 16) {
+   /*
+* The hypercall interface is limited to 16bit PCI segment
+* values.
+*/
+   dev_info(dev,
+"not unregistering with Xen: invalid PCI segment\n");
+   return 0;
+   }
+
if (pci_seg_supported) {
struct physdev_pci_device device = {
.seg = pci_domain_nr(pci_dev->bus),
@@ -182,6 +204,16 @@ int xen_reset_device(const struct pci_dev *dev)
.flags = PCI_DEVICE_RESET_FLR,
};
 
+   if (pci_domain_nr(dev->bus) >> 16) {
+   /*
+* The hypercall interface is limited to 16bit PCI segment
+* values.
+*/
+   dev_info(&dev->dev,
+"unable to notify Xen of device reset: invalid PCI 
segment\n");
+   return 0;
+   }
+
return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device);
 }
 EXPORT_SYMBOL_GPL(xen_reset_device);
-- 
2.46.0




[PATCH v3 2/3] PCI: vmd: Disable MSI remapping bypass under Xen

2025-02-19 Thread Roger Pau Monne
MSI remapping bypass (directly configuring MSI entries for devices on the
VMD bus) won't work under Xen, as Xen is not aware of devices in such bus,
and hence cannot configure the entries using the pIRQ interface in the PV
case, and in the PVH case traps won't be setup for MSI entries for such
devices.

Until Xen is aware of devices in the VMD bus prevent the
VMD_FEAT_CAN_BYPASS_MSI_REMAP capability from being used when running as
any kind of Xen guest.

The MSI remapping bypass is an optional feature of VMD bridges, and hence
when running under Xen it will be masked and devices will be forced to
redirect its interrupts from the VMD bridge.  That mode of operation must
always be supported by VMD bridges and works when Xen is not aware of
devices behind the VMD bridge.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Adjust patch subject.
 - Adjust code comment.

Changes since v1:
 - Add xen header.
 - Expand comment.
---
 drivers/pci/controller/vmd.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 9d9596947350..e619accca49d 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 
 #define VMD_CFGBAR 0
@@ -970,6 +972,24 @@ static int vmd_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
struct vmd_dev *vmd;
int err;
 
+   if (xen_domain()) {
+   /*
+* Xen doesn't have knowledge about devices in the VMD bus
+* because the config space of devices behind the VMD bridge is
+* not known to Xen, and hence Xen cannot discover or configure
+* them in any way.
+*
+* Bypass of MSI remapping won't work in that case as direct
+* write by Linux to the MSI entries won't result in functional
+* interrupts, as Xen is the entity that manages the host
+* interrupt controller and must configure interrupts.  However
+* multiplexing of interrupts by the VMD bridge will work under
+* Xen, so force the usage of that mode which must always be
+* supported by VMD bridges.
+*/
+   features &= ~VMD_FEAT_CAN_BYPASS_MSI_REMAP;
+   }
+
if (resource_size(&dev->resource[VMD_CFGBAR]) < (1 << 20))
return -ENOMEM;
 
-- 
2.46.0




[PATCH v3 3/3] PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag

2025-02-19 Thread Roger Pau Monne
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.

However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip).  One example is devices behind a VMD PCI
bridge.  In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table.  The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s).  Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.

Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis.  Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.

Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.

This fixes using devices behind a VMD bridge on Xen PV hardware domains.

Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them.  By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain.  That's the whole point of the series.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Thomas Gleixner 
Acked-by: Juergen Gross 
---
Changes since v2:
 - Fix subject line.

Changes since v1:
 - Fix build.
 - Expand commit message.
---
 arch/x86/pci/xen.c|  8 ++--
 drivers/pci/msi/msi.c | 37 +
 include/linux/msi.h   |  3 ++-
 kernel/irq/msi.c  |  2 +-
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 0f2fe524f60d..b8755cde2419 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -436,7 +436,8 @@ static struct msi_domain_ops xen_pci_msi_domain_ops = {
 };
 
 static struct msi_domain_info xen_pci_msi_domain_info = {
-   .flags  = MSI_FLAG_PCI_MSIX | MSI_FLAG_FREE_MSI_DESCS | 
MSI_FLAG_DEV_SYSFS,
+   .flags  = MSI_FLAG_PCI_MSIX | MSI_FLAG_FREE_MSI_DESCS |
+ MSI_FLAG_DEV_SYSFS | MSI_FLAG_NO_MASK,
.ops= &xen_pci_msi_domain_ops,
 };
 
@@ -484,11 +485,6 @@ static __init void xen_setup_pci_msi(void)
 * in allocating the native domain and never use it.
 */
x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
-   /*
-* With XEN PIRQ/Eventchannels in use PCI/MSI[-X] masking is solely
-* controlled by the hypervisor.
-*/
-   pci_msi_ignore_mask = 1;
 }
 
 #else /* CONFIG_PCI_MSI */
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 2f647cac4cae..4c8c2b57b5f6 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -10,12 +10,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../pci.h"
 #include "msi.h"
 
 int pci_msi_enable = 1;
-int pci_msi_ignore_mask;
 
 /**
  * pci_msi_supported - check whether MSI may be enabled on a device
@@ -285,6 +285,8 @@ static void pci_msi_set_enable(struct pci_dev *dev, int 
enable)
 static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
  struct irq_affinity_desc *masks)
 {
+   const struct irq_domain *d = dev_get_msi_domain(&dev->dev);
+   const struct msi_domain_info *info = d->host_data;
struct msi_desc desc;
u16 control;
 
@@ -295,8 +297,7 @@ static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
/* Lies, damned lies, and MSIs */
if (dev->dev_flags & PCI_DEV_FLAGS_HAS_MSI_MASKING)
control |= PCI_MSI_FLAGS_MASKBIT;
-   /* Respect XEN's mask disabling */
-   if (pci_msi_ignore_mask)
+   if (info->flags & MSI_FLAG_NO_MASK)
control &= ~PCI_MSI_FLAGS_MASKBIT;
 
desc.nvec_used  = nvec;
@@ -604,12 +605,15 @@ static void __iomem *msix_map_region(struct pci_dev *dev,
  */
 void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
 {
+   const struct irq_domain *d = dev_get_msi_domain(&dev->dev);
+   const struct msi_domain_info *info = d->host_data;
+
desc->nvec_used = 1;
desc->pci.msi_attrib.is_msix= 1;
desc->pci.msi_attrib.is_64  = 1;
desc->pci.msi_attrib.default_irq= dev->irq;
desc->pci.mask_base = dev->msix_base;
-   desc->pc

Re: [PATCH v3 02/25] drm/dumb-buffers: Provide helper to set pitch and size

2025-02-19 Thread Thomas Zimmermann

Hi

Am 18.02.25 um 20:32 schrieb Geert Uytterhoeven:
[...]

+args->bpp);
+   fallthrough;
+   case 12:
+   case 15:
+   case 30: /* see drm_gem_afbc_get_bpp() */
+   case 10:

Perhaps keep them sorted numerically?


The first block comes from the afbc helper; the second block from Mesa; 
hence the odd order. I'll reorder and comment each case individually.





+   case 64: /* used by Mesa */
+   pitch = args->width * DIV_ROUND_UP(args->bpp, SZ_8);
+   break;
+   }
+   }
+
+   if (!pitch || pitch > U32_MAX)
+   return -EINVAL;
+
+   args->pitch = pitch;
+
+   return drm_mode_align_dumb(args, pitch_align, size_align);
+}
+EXPORT_SYMBOL(drm_mode_size_dumb);
+
  int drm_mode_create_dumb(struct drm_device *dev,
  struct drm_mode_create_dumb *args,
  struct drm_file *file_priv)
diff --git a/include/drm/drm_dumb_buffers.h b/include/drm/drm_dumb_buffers.h
new file mode 100644
index ..6fe36004b19d
--- /dev/null
+++ b/include/drm/drm_dumb_buffers.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: MIT */
+
+#ifndef __DRM_DUMB_BUFFERS_H__
+#define __DRM_DUMB_BUFFERS_H__
+
+struct drm_device;
+struct drm_mode_create_dumb;
+
+int drm_mode_size_dumb(struct drm_device *dev,
+  struct drm_mode_create_dumb *args,
+  unsigned long pitch_align,
+  unsigned long size_align);
+
+#endif
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index c082810c08a8..eea09103b1a6 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -1058,7 +1058,7 @@ struct drm_mode_crtc_page_flip_target {
   * struct drm_mode_create_dumb - Create a KMS dumb buffer for scanout.
   * @height: buffer height in pixels
   * @width: buffer width in pixels
- * @bpp: bits per pixel
+ * @bpp: color mode
   * @flags: must be zero
   * @handle: buffer object handle
   * @pitch: number of bytes between two consecutive lines
@@ -1066,6 +1066,50 @@ struct drm_mode_crtc_page_flip_target {
   *
   * User-space fills @height, @width, @bpp and @flags. If the IOCTL succeeds,
   * the kernel fills @handle, @pitch and @size.
+ *
+ * The value of @bpp is a color-mode number describing a specific format
+ * or a variant thereof. The value often corresponds to the number of bits
+ * per pixel for most modes, although there are exceptions. Each color mode
+ * maps to a DRM format plus a number of modes with similar pixel layout.
+ * Framebuffer layout is always linear.
+ *
+ * Support for all modes and formats is optional. Even if dumb-buffer
+ * creation with a certain color mode succeeds, it is not guaranteed that
+ * the DRM driver supports any of the related formats. Most drivers support
+ * a color mode of 32 with a format of DRM_FORMAT_XRGB on their primary
+ * plane.
+ *
+ * ++++
+ * | Color mode | Framebuffer format | Compatibles|
+ * ++++
+ * | 32 |  * DRM_FORMAT_XRGB |  * DRM_FORMAT_XBGR |
+ * |||  * DRM_FORMAT_RGBX |
+ * |||  * DRM_FORMAT_BGRX |
+ * ++++
+ * | 24 |  * DRM_FORMAT_RGB888   |  * DRM_FORMAT_BGR888   |
+ * ++++
+ * | 16 |  * DRM_FORMAT_RGB565   |  * DRM_FORMAT_BGR565   |
+ * ++++
+ * | 15 |  * DRM_FORMAT_XRGB1555 |  * DRM_FORMAT_XBGR1555 |
+ * |||  * DRM_FORMAT_RGBX1555 |
+ * |||  * DRM_FORMAT_BGRX1555 |
+ * ++++
+ * |  8 |  * DRM_FORMAT_C8   |  * DRM_FORMAT_R8   |

+ DRM_FORMAT_D8? (and 4/2/1 below)


Right, missed that.



And DRM_FORMAT_Y8, if/when Tomi's series introducing that is accepted...


Sure, if it is compatible, it can also go into the third column.

Best regards
Thomas




+ * ++++
+ * |  4 |  * DRM_FORMAT_C4   |  * DRM_FORMAT_R4   |
+ * ++++
+ * |  2 |  * DRM_FORMAT_C2   |  * DRM_FORMAT_R2   |
+ * ++++
+ * |  1 |  * DRM_FORMAT_C1   |  * DRM_FORMAT_R1   |
+ * ++++
+ *
+ * Color modes of 10, 12, 15, 30 and 64 are only supported for use by
+ * legacy user space. Please don't use them in new code. Other modes
+ * are not support.
+ *
+ * 

Re: xen/x86: resolve the last 3 MISRA R16.6 violations

2025-02-19 Thread Nicola Vetrini

On 2025-02-18 22:42, Stefano Stabellini wrote:

On Tue, 18 Feb 2025, Jan Beulich wrote:

On 18.02.2025 00:12, Stefano Stabellini wrote:
> On Mon, 17 Feb 2025, Jan Beulich wrote:
>> On 15.02.2025 03:16, Stefano Stabellini wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -3797,22 +3797,14 @@ uint64_t hvm_get_reg(struct vcpu *v, unsigned int 
reg)
>>>  {
>>>  ASSERT(v == current || !vcpu_runnable(v));
>>>
>>> -switch ( reg )
>>> -{
>>> -default:
>>> -return alternative_call(hvm_funcs.get_reg, v, reg);
>>> -}
>>> +return alternative_call(hvm_funcs.get_reg, v, reg);
>>>  }
>>>
>>>  void hvm_set_reg(struct vcpu *v, unsigned int reg, uint64_t val)
>>>  {
>>>  ASSERT(v == current || !vcpu_runnable(v));
>>>
>>> -switch ( reg )
>>> -{
>>> -default:
>>> -return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
>>> -}
>>> +return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
>>>  }
>>
>> Both of these were, iirc, deliberately written using switch(), to ease
>> possible future changes.
>
> To be honest, I do not see any value in the way they are currently
> written. However, if you prefer, I can add a deviation for this, with
> one SAF comment for each of these two. The reason for the deviation
> would be "deliberate to ease possible future change". Please let me know
> how you would like to proceed.

Well, best next thing you can do is seek input from the person who has
written that code, i.e. Andrew.


Andrew wrote in chat that he is OK with a deviation and he can live 
with

a SAF deviation. Here is the patch.


---
xen/x86: resolve the last 3 MISRA R16.6 violations

MISRA R16.6 states that "Every switch statement shall have at least two
switch-clauses". There are only 3 violations left on x86 (zero on ARM).

One of them is only a violation depending on the kconfig configuration.
So deviate it instead with a SAF comment.

Two of them are deliberate to enable future additions. Deviate them as
such.

Signed-off-by: Stefano Stabellini 



Looks good to me, from an ECLAIR point of view. Did you have a chance to 
run a pipeline on it to confirm that the SAF comments are recognized 
correctly?


With that,

Reviewed-by: Nicola Vetrini 


diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index b8a4f878ea..3d68b59169 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -92,6 +92,22 @@
 },
 {
 "id": "SAF-11-safe",
+"analyser": {
+"eclair": "MC3A2.R16.6"
+},
+"name": "Rule 16.6: single clause due to kconfig",
+"text": "A switch statement with a single switch clause 
because other switch clauses are disabled in a given kconfig is safe."

+},
+{
+"id": "SAF-12-safe",
+"analyser": {
+"eclair": "MC3A2.R16.6"
+},
+"name": "Rule 16.6: single clause due to future 
expansion",
+"text": "A switch statement with a single switch clause to 
purposely enable future additions of new cases is safe."

+},
+{
+"id": "SAF-13-safe",
 "analyser": {},
 "name": "Sentinel",
 "text": "Next ID to be used"
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 39e39ce4ce..0f0630769b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3797,6 +3797,7 @@ uint64_t hvm_get_reg(struct vcpu *v, unsigned int 
reg)

 {
 ASSERT(v == current || !vcpu_runnable(v));

+/* SAF-12-safe */
 switch ( reg )
 {
 default:
@@ -3808,6 +3809,7 @@ void hvm_set_reg(struct vcpu *v, unsigned int 
reg, uint64_t val)

 {
 ASSERT(v == current || !vcpu_runnable(v));

+/* SAF-12-safe */
 switch ( reg )
 {
 default:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 87b30ce4df..dca11a613d 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -436,6 +436,7 @@ unsigned long get_stack_trace_bottom(unsigned long 
sp)


 static unsigned long get_shstk_bottom(unsigned long sp)
 {
+/* SAF-11-safe */
 switch ( get_stack_page(sp) )
 {
 #ifdef CONFIG_XEN_SHSTK


--
Nicola Vetrini, B.Sc.
Software Engineer
BUGSENG (https://bugseng.com)
LinkedIn: https://www.linkedin.com/in/nicola-vetrini-a42471253



Re: [PATCH v3 06/25] drm/armada: Compute dumb-buffer sizes with drm_mode_size_dumb()

2025-02-19 Thread Thomas Zimmermann

Hi

Am 18.02.25 um 16:57 schrieb Russell King (Oracle):

On Tue, Feb 18, 2025 at 03:23:29PM +0100, Thomas Zimmermann wrote:

Call drm_mode_size_dumb() to compute dumb-buffer scanline pitch and
buffer size. No alignment required.

Signed-off-by: Thomas Zimmermann 
Cc: Russell King 

armada_pitch() does have some special alignment (it aligns the pitch to
128 bytes). I've no idea what drm_mode_size_dumb() does. Can you check
whether it does the same please?

If it doesn't, then this patch is incorrect.


Indeed, I should have noticed. Will be fixed in the next iteration. 
Thanks for the review.


Best regards
Thomas





--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)




Re: xen/x86: resolve the last 3 MISRA R16.6 violations

2025-02-19 Thread Jan Beulich
On 18.02.2025 22:42, Stefano Stabellini wrote:
> On Tue, 18 Feb 2025, Jan Beulich wrote:
>> On 18.02.2025 00:12, Stefano Stabellini wrote:
>>> On Mon, 17 Feb 2025, Jan Beulich wrote:
 On 15.02.2025 03:16, Stefano Stabellini wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3797,22 +3797,14 @@ uint64_t hvm_get_reg(struct vcpu *v, unsigned int 
> reg)
>  {
>  ASSERT(v == current || !vcpu_runnable(v));
>  
> -switch ( reg )
> -{
> -default:
> -return alternative_call(hvm_funcs.get_reg, v, reg);
> -}
> +return alternative_call(hvm_funcs.get_reg, v, reg);
>  }
>  
>  void hvm_set_reg(struct vcpu *v, unsigned int reg, uint64_t val)
>  {
>  ASSERT(v == current || !vcpu_runnable(v));
>  
> -switch ( reg )
> -{
> -default:
> -return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
> -}
> +return alternative_vcall(hvm_funcs.set_reg, v, reg, val);
>  }

 Both of these were, iirc, deliberately written using switch(), to ease
 possible future changes.
>>>
>>> To be honest, I do not see any value in the way they are currently
>>> written. However, if you prefer, I can add a deviation for this, with
>>> one SAF comment for each of these two. The reason for the deviation
>>> would be "deliberate to ease possible future change". Please let me know
>>> how you would like to proceed.
>>
>> Well, best next thing you can do is seek input from the person who has
>> written that code, i.e. Andrew.
> 
> Andrew wrote in chat that he is OK with a deviation and he can live with
> a SAF deviation. Here is the patch.
> 
> 
> ---
> xen/x86: resolve the last 3 MISRA R16.6 violations
> 
> MISRA R16.6 states that "Every switch statement shall have at least two
> switch-clauses". There are only 3 violations left on x86 (zero on ARM).
> 
> One of them is only a violation depending on the kconfig configuration.
> So deviate it instead with a SAF comment.
> 
> Two of them are deliberate to enable future additions. Deviate them as
> such.
> 
> Signed-off-by: Stefano Stabellini 

Acked-by: Jan Beulich 





Re: [PATCH v3] xen/dom0less: support for vcpu affinity

2025-02-19 Thread Roger Pau Monné
On Tue, Feb 18, 2025 at 12:29:24PM -0800, Stefano Stabellini wrote:
> Add vcpu affinity to the dom0less bindings. Example:
> 
> dom1 {
> ...
> cpus = <4>;
> vcpu0 {
>compatible = "xen,vcpu-affinity";
>id = <0>;
>hard-affinity = "4-7";
> };
> vcpu1 {
>compatible = "xen,vcpu-affinity";
>id = <1>;
>hard-affinity = "0-3";
> };
> vcpu2 {
>compatible = "xen,vcpu-affinity";
>id = <2>;
>hard-affinity = "1,6";
> };
> ...
> 
> Note that the property hard-affinity is optional. It is possible to add
> other properties in the future not only to specify soft affinity, but
> also to provide more precise methods for configuring affinity. For
> instance, on ARM the MPIDR could be use to specify the pCPU. For now, it
> is left to the future.
> 
> Signed-off-by: Xenia Ragiadakou 
> Signed-off-by: Stefano Stabellini 

Sorry to be picky, just an unrelated nit, but usually the first SoB
matches the "From:" field in the patch, or a commit body "From:" tag,
for example:

https://lore.kernel.org/xen-devel/20250124120112.56678-2-roger@citrix.com/

Thanks, Roger.



Re: [PATCH v2 04/11] xen/amd: export processor max frequency value

2025-02-19 Thread Jan Beulich
On 19.02.2025 08:32, Penny, Zheng wrote:
> I've written the following codes to let amd_log_freq() also adapt for 1a+
> ```
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -579,8 +579,7 @@ void amd_log_freq(const struct cpuinfo_x86 *c)
> unsigned int idx = 0, h;
> uint64_t hi, lo, val;
> 
> -   if (c->x86 < 0x10 || c->x86 > 0x19 ||
> -   (c != &boot_cpu_data &&
> +   if (c->x86 < 0x10 || (c != &boot_cpu_data &&
>  (!opt_cpu_info || (c->apicid & (c->x86_num_siblings - 1)
> return;

On what basis do you drop the upper bound here altogether? Is there some
guarantee given somewhere that the MSR layout isn't going to change again?
You also want to pay attention to style (indentation here in particular)
when making such adjustments.

The conditional here also continues to mean the rest of the function won't
normally be executed for all CPUs. Maybe that was intentional, with the
goal of just adding Fam1A support here. Hard to tell without a patch
description.

> @@ -653,21 +652,23 @@ void amd_log_freq(const struct cpuinfo_x86 *c)
> wrmsrl(MSR_AMD64_NB_CFG, nbcfg);
> }
> 
> +#define VALIDATE_FID(v) (c->x86 < 0x19 ? true : ((v & 0xfff) > 0x0f))

Please be sure to parenthesize macro arguments when used in expressions.
Not doing so violates at least one Misra rule. At the same time, seeing
how many parentheses there are already in e.g. FREQ(), please try to
avoid adding excess ones (here and there).

Also, if you add such validation, Wouldn't that be equally needed for e.g.
Fam19 (didn't check others)? Plus if you validate FID there, wouldn't it
be yet more important to validate the divisor, too? (So far we've gone
from the assumption that firmware will put sane values in when setting
PstateEn.)

> lo = 0; /* gcc may not recognize the loop having at least 5 
> iterations */
> for (h = c->x86 == 0x10 ? 5 : 8; h--; )
> -   if (!rdmsr_safe(0xC0010064 + h, lo) && (lo >> 63))
> -   break;
> +   if (!rdmsr_safe(0xC0010064 + h, lo) && (lo >> 63) && 
> VALIDATE_FID(lo))
> +   break;
> if (!(lo >> 63))
> return;
> 
> -#define FREQ(v) (c->x86 < 0x17 ? v) & 0x3f) + 0x10) * 100) >> (((v) >> 
> 6) & 7) \
> -: (((v) & 0xff) * 25 * 8) / (((v) >> 8) 
> & 0x3f))
> +#define FREQ(v) (c->x86 > 0x19 ? ((v & 0xfff) * 5) : 
> \
> +   c->x86 < 0x17 ? v) & 0x3f) + 0x10) * 100) 
> >> (((v) >> 6) & 7) \
> +   : (((v) & 0xff) * 25 * 8) / (((v) >> 8) & 
> 0x3f))

This is getting unwieldy, I'm afraid. We may need to introduce a helper
function here. Or it would need re-wrapping / re-indentation to become
halfway readable again. Plus can we please arrange things so we handle
families in either consistently ascending order, or consistently
descending one?

Jan



[PATCH] x86/MCE-telem: adjust cookie definition

2025-02-19 Thread Jan Beulich
struct mctelem_ent is opaque outside of mcetelem.c; the cookie
abstraction exists - afaict - just to achieve this opaqueness. Then it
is irrelevant though which kind of pointer mctelem_cookie_t resolves to.
IOW we can as well use struct mctelem_ent there, allowing to remove the
casts from COOKIE2MCTE() and MCTE2COOKIE(). Their removal addresses
Misra C:2012 rule 11.2 ("Conversions shall not be performed between a
pointer to an incomplete type and any other type") violations.

No functional change intended.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -64,8 +64,8 @@ struct mctelem_ent {
 
 #define MC_NENT (MC_URGENT_NENT + MC_NONURGENT_NENT)
 
-#defineCOOKIE2MCTE(c)  ((struct mctelem_ent *)(c))
-#defineMCTE2COOKIE(tep)((mctelem_cookie_t)(tep))
+#defineCOOKIE2MCTE(c)  (c)
+#defineMCTE2COOKIE(tep)(tep)
 
 static struct mc_telem_ctl {
/* Linked lists that thread the array members together.
--- a/xen/arch/x86/cpu/mcheck/mctelem.h
+++ b/xen/arch/x86/cpu/mcheck/mctelem.h
@@ -52,7 +52,7 @@
  * the element from the processing list.
  */
 
-typedef struct mctelem_cookie *mctelem_cookie_t;
+typedef struct mctelem_ent *mctelem_cookie_t;
 
 typedef enum mctelem_class {
 MC_URGENT,



[PATCH] x86/MCE-telem: drop unnecessary per-CPU field

2025-02-19 Thread Jan Beulich
struct mc_telem_cpu_ctl's processing field is used solely in
mctelem_process_deferred(), where the local variable can as well be used
directly when retrieving the head of the list to process. This then also
eliminates the field holding a dangling pointer once the processing of
the list finished, in particular when the entry is handed to
mctelem_dismiss().

No functional change intended.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -122,7 +122,6 @@ struct mc_telem_cpu_ctl {
 * to guarantee the above mutual exclusivity.
 */
struct mctelem_ent *pending, *lmce_pending;
-   struct mctelem_ent *processing;
 };
 static DEFINE_PER_CPU(struct mc_telem_cpu_ctl, mctctl);
 
@@ -233,9 +232,7 @@ void mctelem_process_deferred(unsigned i
 * handled by another round of MCE softirq.
 */
mctelem_xchg_head(lmce ? &ctl->lmce_pending : &ctl->pending,
- &this_cpu(mctctl.processing), NULL);
-
-   head = this_cpu(mctctl.processing);
+ &head, NULL);
 
/*
 * Then, fix up the list to include prev pointers, to make



[PATCH] x86/PV: don't half-open-code SIF_PM_MASK

2025-02-19 Thread Jan Beulich
Avoid using the same literal number (8) in two distinct places.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -886,7 +886,7 @@ static int __init dom0_construct(struct
 si->flags= SIF_PRIVILEGED | SIF_INITDOMAIN;
 if ( !vinitrd_start && initrd_len )
 si->flags   |= SIF_MOD_START_PFN;
-si->flags   |= (xen_processor_pmbits << 8) & SIF_PM_MASK;
+si->flags   |= MASK_INSR(xen_processor_pmbits, SIF_PM_MASK);
 si->pt_base  = vpt_start;
 si->nr_pt_frames = nr_pt_pages;
 si->mfn_list = vphysmap_start;



Re: [PATCH v3] xen/dom0less: support for vcpu affinity

2025-02-19 Thread Julien Grall

Hi Stefano,

On 18/02/2025 20:29, Stefano Stabellini wrote:

Add vcpu affinity to the dom0less bindings. Example:

 dom1 {
 ...
 cpus = <4>;
 vcpu0 {
compatible = "xen,vcpu-affinity";


I would prefer if the compatible is "xen,vcpu". This would allow us to 
extend for anything that vCPU specific. I would also look less odd if 
someone ...



id = <0>;
hard-affinity = "4-7";


... doesn't specify hard-affinity which is optional.


 };
 vcpu1 {
compatible = "xen,vcpu-affinity";
id = <1>;
hard-affinity = "0-3";


NIT: This example is exactly the same as vcpu0. How about changing to a 
list of range/single value? This would make clear that a mix is possible.



 };
 vcpu2 {
compatible = "xen,vcpu-affinity";
id = <2>;
hard-affinity = "1,6";
 };
 ...

Note that the property hard-affinity is optional. It is possible to add
other properties in the future not only to specify soft affinity, but
also to provide more precise methods for configuring affinity. For
instance, on ARM the MPIDR could be use to specify the pCPU. For now, it
is left to the future.

Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Stefano Stabellini 
---
Changes in v3:
- improve commit message
- improve binding doc
- add panic on invalid pCPU
- move parsing to a separate function

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 9c881baccc..10e55c825c 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -324,6 +324,27 @@ The ramdisk sub-node has the following properties:
  property because it will be created by the UEFI stub on boot.
  This option is needed only when UEFI boot is used.
  
+Under the "xen,domain" compatible node, it is possible optionally to add

+one or more sub-nodes to configure vCPU affinity. The vCPU affinity
+sub-node has the following properties:
+
+- compatible
+
+"xen,vcpu-affinity"
+
+- id
+
+A 32-bit integer that specifies the vCPU id. 0 is the first vCPU.
+The last vCPU is cpus-1, where "cpus" is the number of vCPUs
+specified with the "cpus" property under the "xen,domain" node.


I think it would be worth mentioning that each node must have a unique 
ID. It is not necessary to check in the code, but it would avoid the 
question of what happen if someone specify twice the VCPU with different 
affinity.



+
+- hard-affinity
+
+Optional. A string specifying the hard affinity configuration for the
+vCPU: a comma-separated list of pCPUs or ranges of pCPUs is used.
+Ranges are hyphen-separated intervals (such as `0-4`) and are inclusive
+on both sides. The numbers refer to pCPU ids.


Technically MPIDRs are pCPUs ID. So I would add "logical" in front of 
pCPU ids to make clear what IDs are we talking about



+
  
  Example

  ===


No update to the example?


diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 49d1f14d65..e364820189 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -810,6 +810,68 @@ static int __init construct_domU(struct domain *d,
  return rc;
  }
  
+static void __init domain_vcpu_affinity(struct domain *d,

+const struct dt_device_node *node)

> +{> +const char *hard_affinity_str = NULL;

+struct dt_device_node *np;
+uint32_t val;
+int rc;


Can you expain why 'rc', 'val', 'hard_affinity_str' are defined outside 
of the loop when ...



+
+dt_for_each_child_node(node, np)
+{
+const char *s;
+struct vcpu *v;
+cpumask_t affinity;


... they are not? Yet they have the same property (i.e. only called 
within the loop).



+
+if ( !dt_device_is_compatible(np, "xen,vcpu-affinity") )
+continue;
+
+if ( !dt_property_read_u32(np, "id", &val) )


Looking at the binding you wrote, "id" is mandatory. So I think we 
should throw an error if it is not present.



+continue;

> +> +if ( val >= d->max_vcpus )

+panic("Invalid vcpu_id %u for domain %s\n", val, 
dt_node_name(node));


NIT: Maybe print the maximum number of vCPUs? This would make easier to 
know what's wrong with the ID.



+
+v = d->vcpu[val];
+rc = dt_property_read_string(np, "hard-affinity", &hard_affinity_str);
+if ( rc < 0 )
+continue;
+
+s = hard_affinity_str;


OOI, you don't seem to use hard_affinity_str afterwards, so why can't we 
use 'hard_affinity_str' directly and avoid an extra variable?



+cpumask_clear(&affinity);
+while ( *s != '\0' )
+{
+unsigned int start, end;
+
+start = simple_strtoul(s, &s, 0);
+
+

Re: [PATCH] x86/PV: don't half-open-code SIF_PM_MASK

2025-02-19 Thread Andrew Cooper
On 19/02/2025 10:02 am, Jan Beulich wrote:
> Avoid using the same literal number (8) in two distinct places.

You say two places but this is only one hunk.  I presume you mean
SIF_PM_MASK as the other place.

In which case I'd suggest that this would be clearer if phrased as "Use
MASK_INTR() to avoid opencoding the literal 8."

>
> Signed-off-by: Jan Beulich 

Preferably with some kind of adjustment to the commit message, Acked-by:
Andrew Cooper 



Re: [PATCH] x86/PV: don't half-open-code SIF_PM_MASK

2025-02-19 Thread Jan Beulich
On 19.02.2025 11:29, Andrew Cooper wrote:
> On 19/02/2025 10:02 am, Jan Beulich wrote:
>> Avoid using the same literal number (8) in two distinct places.
> 
> You say two places but this is only one hunk.  I presume you mean
> SIF_PM_MASK as the other place.

Indeed. Somewhere there needs to be a literal number. Just that it should
be only one place rather than two. Obviously that other place isn't
touched, and hence isn't visible in the patch itself.

> In which case I'd suggest that this would be clearer if phrased as "Use
> MASK_INTR() to avoid opencoding the literal 8."

I've appended this to the sentence there was, i.e "..., using MASK_INTR()
...". To be honest, given the simplicity of the code change, I didn't
think it would be necessary to also say this verbally.

>> Signed-off-by: Jan Beulich 
> 
> Preferably with some kind of adjustment to the commit message, Acked-by:
> Andrew Cooper 

Thanks.

Jan



Re: [PATCH] x86/MCE-telem: drop unnecessary per-CPU field

2025-02-19 Thread Andrew Cooper
On 19/02/2025 10:01 am, Jan Beulich wrote:
> struct mc_telem_cpu_ctl's processing field is used solely in
> mctelem_process_deferred(), where the local variable can as well be used
> directly when retrieving the head of the list to process. This then also
> eliminates the field holding a dangling pointer once the processing of
> the list finished, in particular when the entry is handed to
> mctelem_dismiss().
>
> No functional change intended.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 



Re: [PATCH] x86/PV: don't half-open-code SIF_PM_MASK

2025-02-19 Thread Andrew Cooper
On 19/02/2025 10:34 am, Jan Beulich wrote:
> On 19.02.2025 11:29, Andrew Cooper wrote:
>> On 19/02/2025 10:02 am, Jan Beulich wrote:
>>> Avoid using the same literal number (8) in two distinct places.
>> You say two places but this is only one hunk.  I presume you mean
>> SIF_PM_MASK as the other place.
> Indeed. Somewhere there needs to be a literal number. Just that it should
> be only one place rather than two. Obviously that other place isn't
> touched, and hence isn't visible in the patch itself.
>
>> In which case I'd suggest that this would be clearer if phrased as "Use
>> MASK_INTR() to avoid opencoding the literal 8."
> I've appended this to the sentence there was, i.e "..., using MASK_INTR()
> ...". To be honest, given the simplicity of the code change, I didn't
> think it would be necessary to also say this verbally.

Honestly, you saying "two distinct places" for a while made me think
you'd forgotten a hunk.  It took longer than I care to admin to realise
that the change was in fact correct as-is.

~Andrew



Re: [PATCH] x86/MCE-telem: adjust cookie definition

2025-02-19 Thread Jan Beulich
On 19.02.2025 11:44, Andrew Cooper wrote:
> On 19/02/2025 10:00 am, Jan Beulich wrote:
>> struct mctelem_ent is opaque outside of mcetelem.c; the cookie
>> abstraction exists - afaict - just to achieve this opaqueness. Then it
>> is irrelevant though which kind of pointer mctelem_cookie_t resolves to.
>> IOW we can as well use struct mctelem_ent there, allowing to remove the
>> casts from COOKIE2MCTE() and MCTE2COOKIE(). Their removal addresses
>> Misra C:2012 rule 11.2 ("Conversions shall not be performed between a
>> pointer to an incomplete type and any other type") violations.
>>
>> No functional change intended.
>>
>> Signed-off-by: Jan Beulich 
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/9181587757
> 
> Eclair does appear to be happy with this approach (assuming I stripped
> down to only checking R11.2 correctly, and making it fatal).
> 
> For the change itself, it's an almost identical binary, differing only
> in the string section which I expect means some embedded line numbers.

Line number changes would be odd, given the patch doesn't add or remove
any lines. I expect its internal kind-of-sequence numbers of the compiler,
used to e.g. generate local label names.

> Reviewed-by: Andrew Cooper 

Thanks.

Jan



Re: [PATCH] x86/MCE-telem: adjust cookie definition

2025-02-19 Thread Andrew Cooper
On 19/02/2025 10:00 am, Jan Beulich wrote:
> struct mctelem_ent is opaque outside of mcetelem.c; the cookie
> abstraction exists - afaict - just to achieve this opaqueness. Then it
> is irrelevant though which kind of pointer mctelem_cookie_t resolves to.
> IOW we can as well use struct mctelem_ent there, allowing to remove the
> casts from COOKIE2MCTE() and MCTE2COOKIE(). Their removal addresses
> Misra C:2012 rule 11.2 ("Conversions shall not be performed between a
> pointer to an incomplete type and any other type") violations.
>
> No functional change intended.
>
> Signed-off-by: Jan Beulich 

https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/9181587757

Eclair does appear to be happy with this approach (assuming I stripped
down to only checking R11.2 correctly, and making it fatal).

For the change itself, it's an almost identical binary, differing only
in the string section which I expect means some embedded line numbers.

Reviewed-by: Andrew Cooper 



Re: [PATCH] x86/MCE-telem: adjust cookie definition

2025-02-19 Thread Nicola Vetrini

On 2025-02-19 11:44, Andrew Cooper wrote:

On 19/02/2025 10:00 am, Jan Beulich wrote:

struct mctelem_ent is opaque outside of mcetelem.c; the cookie
abstraction exists - afaict - just to achieve this opaqueness. Then it
is irrelevant though which kind of pointer mctelem_cookie_t resolves 
to.
IOW we can as well use struct mctelem_ent there, allowing to remove 
the

casts from COOKIE2MCTE() and MCTE2COOKIE(). Their removal addresses
Misra C:2012 rule 11.2 ("Conversions shall not be performed between a
pointer to an incomplete type and any other type") violations.

No functional change intended.

Signed-off-by: Jan Beulich 


https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/9181587757

Eclair does appear to be happy with this approach (assuming I stripped
down to only checking R11.2 correctly, and making it fatal).



The analysis settings are correct.


For the change itself, it's an almost identical binary, differing only
in the string section which I expect means some embedded line numbers.

Reviewed-by: Andrew Cooper 


--
Nicola Vetrini, B.Sc.
Software Engineer
BUGSENG (https://bugseng.com)
LinkedIn: https://www.linkedin.com/in/nicola-vetrini-a42471253



Re: struct mctelem_cookie missing definition

2025-02-19 Thread Nicola Vetrini

On 2025-02-18 22:37, Stefano Stabellini wrote:

On Tue, 18 Feb 2025, Jan Beulich wrote:

On 18.02.2025 03:45, Stefano Stabellini wrote:
> On Mon, 17 Feb 2025, Jan Beulich wrote:
>> On 15.02.2025 09:59, Nicola Vetrini wrote:
>>> On 2025-02-15 00:04, Stefano Stabellini wrote:
 On Fri, 14 Feb 2025, Jan Beulich wrote:
>> Would deviating macros "COOKIE2MCTE" and "MCTE2COOKIE" work?
>
> If it did, COOKIE2ID and ID2COOKIE would likely need including as
> well.

 I wrote this patch. Nicola, can you please check the changes to
 deviations.ecl, this is the first time I try to write the deviation
 myself :-)

 ---
 misra: add 11.2 deviation for incomplete types

 struct mctelem_cookie* is used exactly because it is an incomplete type
 so the pointer cannot be dereferenced. This is deliberate. So add a
 deviation for it.

 In deviations.ecl, add the list of macros that do the conversions to
 and
 from struct mctelem_cookie*.

 Signed-off-by: Stefano Stabellini 

 diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl
 b/automation/eclair_analysis/ECLAIR/deviations.ecl
 index a28eb0ae76..87bfd2160c 100644
 --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
 +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
 @@ -366,6 +366,14 @@ constant expressions are required.\""
  }
  -doc_end

 +-doc_begin="Certain pointers point to incomplete types purposely so
 that they are impossible to dereference."
 +-config=MC3A2.R11.2,reports+={deliberate,
 "any_area(any_loc(any_exp(macro(^COOKIE2MCTE$"}
 +-config=MC3A2.R11.2,reports+={deliberate,
 "any_area(any_loc(any_exp(macro(^MCTE2COOKIE$"}
 +-config=MC3A2.R11.2,reports+={deliberate,
 "any_area(any_loc(any_exp(macro(^COOKIE2ID$"}
 +-config=MC3A2.R11.2,reports+={deliberate,
 "any_area(any_loc(any_exp(macro(^ID2COOKIE$"}
 +}
>>>
>>> -config=MC3A2.R11.2,reports+={deliberate,
>>> 
"any_area(any_loc(any_exp(macro(name(COOKIE2MCTE||MCTE2COOKIE||COOKIE2ID||ID2COOKIE)"}
>>>
>>> Note however that there is also this deviation in place
>>>
>>> -doc_begin="The conversion from a pointer to an incomplete type to
>>> unsigned long does not lose any information, provided that the target
>>> type has enough bits to store it."
>>> -config=MC3A2.R11.2,casts+={safe,
>>>"from(type(any()))
>>> &&to(type(canonical(builtin(unsigned long
>>> &&relation(definitely_preserves_value)"
>>> }
>>> -doc_end
>>>
>>> I was a bit confused by Jan's remark, which seemed correct, but I
>>> couldn't see any violations in the report, so I dug a bit deeper.
>>> ID2COOKIE and COOKIE2ID, which operate to/from unsigned long are already
>>> excluded due to being safe. If you envision those macros to be used with
>>> other types, then your deviation should mention them, otherwise they are
>>> already handled.
>>
>> Yet then can't we leverage that deviation to also make the other two
>> covered:
>>
>> #defineCOOKIE2MCTE(c)  ((struct mctelem_ent *)(unsigned 
long)(c))
>> #defineMCTE2COOKIE(tep)((mctelem_cookie_t)(unsigned 
long)(tep))
>
> Jan is asking why ID2COOKIE and COOKIE2ID are considered safe, while
> COOKIE2MCTE and MCTE2COOKIE are not. I think the reason is that
> ID2COOKIE and COOKIE2ID convert to/from unsigned long and that falls
> under the other deviation we already have:
>
> -doc_begin="The conversion from a pointer to an incomplete type to
> unsigned long does not lose any information, provided that the target
> type has enough bits to store it."
> -config=MC3A2.R11.2,casts+={safe,
>"from(type(any()))
> &&to(type(canonical(builtin(unsigned long
> &&relation(definitely_preserves_value)"
>
> On the other hand COOKIE2MCTE and MCTE2COOKIE convert to/from another
> pointer type, so they don't fall under the same deviation.

And then the adjusted forms suggested above ought to also be covered,
I would have thought.


I understand your point. I tried it, but it does not work. I do not 
know

why. Someone with more knowledge of ECLAIR internals than I have might
be able to explain.

https://saas.eclairit.com:3787/fs/var/local/eclair/xen-project.ecdf/xen-project/people/sstabellini/xen/ECLAIR_normal/my-eclair-11.2-4-1/X86_64/9176469474/PROJECT.ecd;/by_service/MC3A2.R11.2.html#{%22select%22:true,%22selection%22:{%22hiddenAreaKinds%22:[],%22hiddenSubareaKinds%22:[],%22show%22:false,%22selector%22:{%22enabled%22:true,%22negated%22:true,%22kind%22:0,%22domain%22:%22kind%22,%22inputs%22:[{%22enabled%22:true,%22text%22:%22violation%22}]}}}



The reason is quite simple: the deviation is for casts from any type to 
unsigned long. It would need a similar configuration 
from(type(canonical(builtin(unsigned long to(any()) in order to 
catch those.




I suggest we go with this patch instead.

---
misra: add 11.2 deviation for incomplete types

struct mctelem_cookie* 

Re: [PATCH for 4.21 v6 2/2] xen/riscv: identify specific ISA supported by cpu

2025-02-19 Thread Jan Beulich
On 12.02.2025 17:50, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/cpufeature.c
> @@ -0,0 +1,502 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Originally taken for Linux kernel v6.12-rc3.
> + *
> + * Copyright (C) 2015 ARM Ltd.
> + * Copyright (C) 2017 SiFive
> + * Copyright (C) 2024 Vates
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI
> +# error "cpufeature.c functions should be updated to support ACPI"
> +#endif
> +
> +struct riscv_isa_ext_data {
> +unsigned int id;
> +const char *name;
> +};
> +
> +#define RISCV_ISA_EXT_DATA(ext_name)\
> +{   \
> +.id = RISCV_ISA_EXT_##ext_name, \

Nit: ## being a binary operator (just for the pre-processor) we prefer
it, too, to be framed by blanks.

> +/*
> + * The canonical order of ISA extension names in the ISA string is defined in
> + * chapter 27 of the unprivileged specification.
> + *
> + * The specification uses vague wording, such as should, when it comes to
> + * ordering, so for our purposes the following rules apply:
> + *
> + * 1. All multi-letter extensions must be separated from other extensions by 
> an
> + *underscore.
> + *
> + * 2. Additional standard extensions (starting with 'Z') must be sorted after
> + *single-letter extensions and before any higher-privileged extensions.
> + *
> + * 3. The first letter following the 'Z' conventionally indicates the most
> + *closely related alphabetical extension category, IMAFDQLCBKJTPVH.
> + *If multiple 'Z' extensions are named, they must be ordered first by
> + *category, then alphabetically within a category.
> + *
> + * 4. Standard supervisor-level extensions (starting with 'S') must be listed
> + *after standard unprivileged extensions.  If multiple supervisor-level
> + *extensions are listed, they must be ordered alphabetically.
> + *
> + * 5. Standard machine-level extensions (starting with 'Zxm') must be listed
> + *after any lower-privileged, standard extensions.  If multiple
> + *machine-level extensions are listed, they must be ordered
> + *alphabetically.
> + *
> + * 6. Non-standard extensions (starting with 'X') must be listed after all
> + *standard extensions. If multiple non-standard extensions are listed, 
> they
> + *must be ordered alphabetically.
> + *
> + * An example string following the order is:
> + *rv64imadc_zifoo_zigoo_zafoo_sbar_scar_zxmbaz_xqux_xrux
> + *
> + * New entries to this struct should follow the ordering rules described 
> above.
> + *
> + * Extension name must be all lowercase (according to device-tree binding)
> + * and strncmp() is used in match_isa_ext() to compare extension names 
> instead
> + * of strncasecmp().
> + */
> +const struct riscv_isa_ext_data __initconst riscv_isa_ext[] = {
> +RISCV_ISA_EXT_DATA(i),
> +RISCV_ISA_EXT_DATA(m),
> +RISCV_ISA_EXT_DATA(a),
> +RISCV_ISA_EXT_DATA(f),
> +RISCV_ISA_EXT_DATA(d),
> +RISCV_ISA_EXT_DATA(q),
> +RISCV_ISA_EXT_DATA(c),
> +RISCV_ISA_EXT_DATA(h),
> +RISCV_ISA_EXT_DATA(zicntr),
> +RISCV_ISA_EXT_DATA(zicsr),
> +RISCV_ISA_EXT_DATA(zifencei),
> +RISCV_ISA_EXT_DATA(zihintpause),
> +RISCV_ISA_EXT_DATA(zihpm),
> +RISCV_ISA_EXT_DATA(zbb),

No Zba and Zbs here, despite there now being enumerators for them?

> +static int __init riscv_isa_parse_string(const char *isa,
> + unsigned long *out_bitmap)
> +{
> +if ( (isa[0] != 'r') && (isa[1] != 'v') )
> +return -EINVAL;
> +
> +#if defined(CONFIG_RISCV_32)
> +if ( isa[2] != '3' && isa[3] != '2' )
> +return -EINVAL;
> +#elif defined(CONFIG_RISCV_64)
> +if ( isa[2] != '6' && isa[3] != '4' )
> +return -EINVAL;
> +#else
> +# error "unsupported RISC-V bitness"
> +#endif
> +
> +/*
> + * In unpriv. specification (*_20240411) is mentioned the following:
> + * (1) A RISC-V ISA is defined as a base integer ISA, which must be
> + * present in any implementation, plus optional extensions to
> + * the base ISA.
> + * (2) Chapter 6 describes the RV32E and RV64E subset variants of
> + * the RV32I or RV64I base instruction sets respectively, which
> + * have been added to support small microcontrollers, and which
> + * have half the number of integer registers.
> + *
> + * What means that isa should contain, at least, I or E.
> + *
> + * As Xen isn't expected to be run on microcontrollers and according
> + * to device tree binding the first extension should be "i".
> + */
> +if ( isa[4] != 'i' )
> +return -EINVAL;
> +
> +isa += 4;
> +
> +while ( *isa )
> +{
> +const char *ext = isa++;
> +const char *ext_end = isa;
> +
> +switch ( *ext )
> +{
> +case 'x':
> +prin

Re: [PATCH for 4.20? v4 1/3] xen/riscv: implement software page table walking

2025-02-19 Thread Jan Beulich
On 12.02.2025 17:50, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -185,6 +185,68 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, 
> unsigned int offset)
>  return XEN_TABLE_NORMAL;
>  }
>  
> +/*
> + * _pt_walk() performs software page table walking and returns the pte_t of
> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
> + * for further analysis.
> + *
> + * Additionally, _pt_walk() returns the level of the found pte by using
> + * `pte_level` argument.
> + * `pte_level` is optional, set `pte_level`=NULL if a caller doesn't need
> + * the level of the found pte.

How about this, reducing redundancy a little?

 * _pt_walk() can optionally return the level of the found pte. Pass NULL
 * for `pte_level` if this information isn't needed.

> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
> +{
> +pte_t *entry = _pt_walk(va, pte_level);
> +pte_t pte = *entry;
> +
> +unmap_table(entry);
> +
> +return pte;
> +}

"entry" especially in this context is ambiguous. I would expect a variable of
this name to be of type pte_t, not pte_t *. How about "ptep"?

Preferably with these adjustments, which I'd be fine making while committing,
Reviewed-by: Jan Beulich 

Considering the 4.20? tag you'll need to decide whether you still want this
in before the release.

Jan



Re: [PATCH for 4.20? v4 2/3] xen/riscv: update defintion of vmap_to_mfn()

2025-02-19 Thread Jan Beulich
On 12.02.2025 17:50, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -210,6 +210,13 @@ static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int 
> flags)
>  
>  pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
>  
> +static inline mfn_t _vmap_to_mfn(vaddr_t va)
> +{
> +pte_t entry = pt_walk(va, NULL);
> +BUG_ON(!pte_is_mapping(entry));
> +return mfn_from_pte(entry);
> +}

Nit: Blank line between declaration(s) and statement(s) please. Blank line
ahead of the main return statement of a function please. With that (happy
to adjust while committing):
Reviewed-by: Jan Beulich 

Jan



Re: [PATCH v2 1/2] xen/arm: fix iomem permissions cfg in map_range_to_domain()

2025-02-19 Thread Julien Grall

Hi Grygorii,

On 18/02/2025 11:22, Grygorii Strashko wrote:

Now the following code in map_range_to_domain()

 res = iomem_permit_access(d, paddr_to_pfn(addr),
 paddr_to_pfn(PAGE_ALIGN(addr + len - 1)));

calculates the iomem range end address by rounding it up to the next Xen
page with incorrect assumption that iomem range end address passed to
iomem_permit_access() is exclusive, while it is expected to be inclusive.
It gives Control domain (Dom0) access to manage incorrect MMIO range with
one additional page.

For example, if requested range is [00e614:00e6141004] then it expected
to add [e6140:e6141] range (num_pages=2) to the domain iomem_caps rangeset,
but will add [e6140:e6142] (num_pages=3) instead.

To fix it, drop PAGE_ALIGN() from the iomem range end address calculation
formula.

Fixes: 33233c2758345 ("arch/arm: domain build: let dom0 access I/O memory
of mapped devices")
Signed-off-by: Grygorii Strashko 


Reviewed-by: Julien Grall 

Cheers,


---
  xen/arch/arm/device.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index 5610cddcba8e..97e613e06afa 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -71,7 +71,7 @@ int map_range_to_domain(const struct dt_device_node *dev,
   strlen("/reserved-memory/")) != 0 )
  {
  res = iomem_permit_access(d, paddr_to_pfn(addr),
-paddr_to_pfn(PAGE_ALIGN(addr + len - 1)));
+  paddr_to_pfn(addr + len - 1));
  if ( res )
  {
  printk(XENLOG_ERR "Unable to permit to dom%d access to"


--
Julien Grall




Re: [PATCH v2 2/2] xen/arm: fix iomem_ranges cfg in map_range_to_domain()

2025-02-19 Thread Julien Grall

Hi Grygorii,

On 18/02/2025 11:22, Grygorii Strashko wrote:

Now the following code in map_range_to_domain()

  res = rangeset_add_range(mr_data->iomem_ranges,
   paddr_to_pfn(addr),
   paddr_to_pfn_aligned(addr + len - 1));
  where
   paddr_to_pfn_aligned(paddr) defined as paddr_to_pfn(PAGE_ALIGN(paddr))

calculates the iomem range end address by rounding it up to the next Xen
page with incorrect assumption that iomem range end address passed to
rangeset_add_range() is exclusive, while it is expected to be inclusive.

For example, if requested range is [00e614:00e6141004] then it expected
to add [e6140:e6141] range (num_pages=2) to the mr_data->iomem_ranges
rangeset, but will add [e6140:e6142] (num_pages=3) instead.

To fix it, drop PAGE_ALIGN() from the iomem range end address calculation
formula and just use paddr_to_pfn(addr + len - 1).

Fixes: 57d4d7d4e8f3b (arm/asm/setup.h: Update struct map_range_data to add
rangeset.")
Signed-off-by: Grygorii Strashko 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall




Re: [PATCH for 4.20? v4 3/3] xen/riscv: update mfn calculation in pt_mapping_level()

2025-02-19 Thread Jan Beulich
On 12.02.2025 17:50, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -249,12 +249,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>  
>  /* Update an entry at the level @target. */
>  static int pt_update_entry(mfn_t root, vaddr_t virt,
> -   mfn_t mfn, unsigned int target,
> +   mfn_t mfn, unsigned int *target,
> unsigned int flags)
>  {
>  int rc;
> -unsigned int level = HYP_PT_ROOT_LEVEL;
> -pte_t *table;
>  /*
>   * The intermediate page table shouldn't be allocated when MFN isn't
>   * valid and we are not populating page table.
> @@ -265,41 +263,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>   * combinations of (mfn, flags).
>  */
>  bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
> -pte_t pte, *entry;
> -
> -/* convenience aliases */
> -DECLARE_OFFSETS(offsets, virt);
> +pte_t pte, *entry = NULL;

With there also being "table" below, "entry" isn't quite as bad as in the
other patch. Yet I'd still like to ask that you consider renaming.

> -table = map_table(root);
> -for ( ; level > target; level-- )
> +if ( *target == CONFIG_PAGING_LEVELS )
> +entry = _pt_walk(virt, target);

Imo it's quite important for the comment ahead of the function to be updated
to mention this special case.

> +else
>  {
> -rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> -if ( rc == XEN_TABLE_MAP_NOMEM )
> +pte_t *table;
> +unsigned int level = HYP_PT_ROOT_LEVEL;
> +/* convenience aliases */

Nit: Style.

> @@ -331,7 +336,8 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>  rc = 0;
>  
>   out:
> -unmap_table(table);
> +if ( entry )
> +unmap_table(entry);

Would it perhaps be worth for unmap_table() to gracefully handle being passed
NULL, to avoid such conditionals (there may be more in the future)?

Jan



Re: [PATCH] xen/arm: fix iomem_ranges cfg in map_range_to_domain()

2025-02-19 Thread Julien Grall

Hi Andrew,

On 14/02/2025 14:25, Andrew Cooper wrote:

On 14/02/2025 2:10 pm, Grygorii Strashko wrote:


For example, requested range:
    00e614 - 00e6141004 should set e6140:e6141 nr=2, but will
configure
e6140 e6142 nr=3 instead.


I am not sure what 'nr' is referring to here.


Sorry, will change to "num_pages"?


I agree Xen needs to be better (and by that, I mean consistent and
clear), but there are subtle bugs with most approaches like this.

Any exclusive bound, as well as counts like this need $n+1 bits of
arithmetic when you want to describe the boundaries of the space.

There is also a boundary condition for counts.  What map_foo(x, 0) mean?


I would consider "0" has invalid.



Real hardware uses "last" for describing boundaries (x86 specifically
calls this "limit" in the ISA, but it's a trick used by other
architectures too).  Unlike "end", it's clearly an inclusive bound.

Personally, I'd like to see Xen switch to "start, last" pairs.  It's
unambiguous and has fewest boundary cases.


"start, last" would also work for me.

Cheers,

--
Julien Grall




Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Jan!

On 19.02.25 14:49, Jan Beulich wrote:

On 19.02.2025 13:43, Oleksandr Andrushchenko wrote:

Hello, Jan, Stefano!

On 18.02.25 13:34, Jan Beulich wrote:

On 18.02.2025 03:36, Stefano Stabellini wrote:

On Mon, 17 Feb 2025, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

1. Const string arrays reformatting
In case the length of items change we might need to introduce a bigger
change wrt new formatting of unaffected lines
==

--- a/xen/drivers/acpi/tables.c
+++ b/xen/drivers/acpi/tables.c
@@ -38,10 +38,10 @@
-static const char *__initdata
-mps_inti_flags_polarity[] = { "dfl", "high", "res", "low" };
-static const char *__initdata
-mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
+static const char *__initdata mps_inti_flags_polarity[] = { "dfl", "high",
+"res", "low" };
+static const char *__initdata mps_inti_flags_trigger[] = { "dfl", "edge", 
"res",

--- a/xen/drivers/acpi/utilities/utglobal.c
+++ b/xen/drivers/acpi/utilities/utglobal.c
   static const char *const acpi_gbl_region_types[ACPI_NUM_PREDEFINED_REGIONS] 
= {
-   "SystemMemory",
-   "SystemIO",
-   "PCI_Config",
-   "EmbeddedControl",
-   "SMBus",
-   "CMOS",
-   "PCIBARTarget",
-   "DataTable"
+"SystemMemory", "SystemIO", "PCI_Config",   "EmbeddedControl",
+"SMBus","CMOS", "PCIBARTarget", "DataTable"
   };

Why in the world would a tool need to touch anything like the two examples
above? My take is that the code is worse readability-wise afterwards.

I think the output is acceptable: not necessarily better than before,
but also not significantly worse.

Hmm, for the change to xen/drivers/acpi/tables.c I wouldn't agree with this
statement. And for xen/drivers/acpi/utilities/utglobal.c remember that this
is code taken from ACPI CA, which we may better not re-format.

We can use /* clang-format off */ constructs to protect those lines we
do not want to be touched by clang-format [1]. This is what Grygprii
mentioned in some other e-mail.

We have fall-through comments. We have SAF comments. Yet another flavor to
keep some external tool happy. If everyone else thinks this is a good idea,
I'm not intending to stand in the way. Yet I don't like this as a workaround.
Instead I think the tool's going too far.

Yes, I do agree. But only if we talk about having an automated
code style check now (which is definitely the goal at some time).
Before that we could still use the tool to take all that good that
it does and manually prepare a set of patches to fix those
code style issues which we like.


Jan





Re: [PATCH v3 1/2] xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

2025-02-19 Thread Jan Beulich
On 19.02.2025 14:06, Luca Fancellu wrote:
>> On 19 Feb 2025, at 12:45, Jan Beulich  wrote:
>> On 18.02.2025 10:51, Luca Fancellu wrote:
>>> --- a/xen/include/xen/iommu.h
>>> +++ b/xen/include/xen/iommu.h
>>> @@ -110,6 +110,8 @@ extern int8_t iommu_hwdom_reserved;
>>>
>>> extern unsigned int iommu_dev_iotlb_timeout;
>>>
>>> +#ifdef CONFIG_HAS_PASSTHROUGH
>>> +
>>> int iommu_setup(void);
>>> int iommu_hardware_setup(void);
>>>
>>> @@ -122,6 +124,28 @@ int arch_iommu_domain_init(struct domain *d);
>>> void arch_iommu_check_autotranslated_hwdom(struct domain *d);
>>> void arch_iommu_hwdom_init(struct domain *d);
>>>
>>> +#else
>>> +
>>> +static inline int iommu_setup(void)
>>> +{
>>> +return -ENODEV;
>>> +}
>>> +
>>> +static inline int iommu_domain_init(struct domain *d, unsigned int opts)
>>> +{
>>> +/*
>>> + * When !HAS_PASSTHROUGH, iommu_enabled is set to false and the 
>>> expected
>>> + * behaviour of this function is to return success in that case.
>>> + */
>>> +return 0;
>>> +}
>>
>> Hmm. Would the function be anywhere near likely to do anything else than
>> what it's expected to do? My original concern here was with "opts"
>> perhaps asking for something that cannot be supported. But that was wrong
>> thinking on my part. Here what you do is effectively open-code what the
>> real iommu_domain_init() would do: Return success when !is_iommu_enabled().
>> Which in turn follows from !iommu_enabled when !HAS_PASSTHROUGH.
>>
>> On that basis I'd be okay if the comment was dropped again. Else it imo
>> wants re-wording to get closer to the explanation above.
> 
> Would it be ok for you a comment saying:
> “This stub returns the same as the real iommu_domain_init()
>  function: success when !is_iommu_enabled(), which value is based on 
> iommu_enabled
> that is false when !HAS_PASSTHROUGH"

I'm sorry, but this is too verbose for my taste. What's wrong with the more
terse

"Return as the real iommu_domain_init() would: Success when
 !is_iommu_enabled(), following from !iommu_enabled when !HAS_PASSTHROUGH"

?

>>> @@ -383,12 +429,12 @@ struct domain_iommu {
>>> #define iommu_set_feature(d, f)   set_bit(f, dom_iommu(d)->features)
>>> #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
>>>
>>> +#ifdef CONFIG_HAS_PASSTHROUGH
>>> /* Are we using the domain P2M table as its IOMMU pagetable? */
>>> #define iommu_use_hap_pt(d)   (IS_ENABLED(CONFIG_HVM) && \
>>>dom_iommu(d)->hap_pt_share)
>>>
>>> /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
>>> -#ifdef CONFIG_HAS_PASSTHROUGH
>>> #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
>>>
>>> int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
>>
>> Coming back to my v2 comment: Why exactly is this change needed here? If
>> things build fine without the macro being defined when !HAS_PASSTHROUGH,
>> surely they will also build fine with it being defined?
> 
> I’ve defined an empty stub on an header included only on MPU systems for the
> p2m module, this is why it is building

But that wasn't part of the patch, was it? I.e. with this series alone
applied, things still don't build?

> I didn’t modify p2m_set_way_flush() which lives in arm common code, because
> it will be used also on MPU systems (R82 has MPU at EL2 but MMU/MPU at EL1)
> and I would like to stay the same and be used by MMU/MPU subsystems.
> 
>> As per the
>> respective revlog entry, this change looks to belong into whatever is
>> going to be done to deal with the one Arm use of the macro. Or maybe
>> it's unneeded altogether.
> 
> I didn’t understand that you were opposing to protecting iommu_use_hap_pt() 
> when
> !HAS_PASSTHROUGH, I thought you were referring only to the stub in the #else
> branch.
> Can I ask why?

Sure. And no, I'm not against the extra protection. I'm against unnecessary
code churn. That is, any such a re-arrangement wants to have some kind of
justification.

> in any case when !HAS_PASSTHROUGH, this macro is not usable
> since dom_iommu() is resolved to a membed that doesn’t exist in the 
> configuration,
> am I missing something?

You very likely aren't, yet the macro's presence also does no harm. We
have lots of macros and declarations which are usable only in certain
configurations. Sometimes this just happens to be that way, sometimes it's
actually deliberate (e.g. to facilitate DCE).

Jan



Re: [PATCH v5] xen/console: print Xen version via keyhandler

2025-02-19 Thread Jan Beulich
On 17.02.2025 22:33, dm...@proton.me wrote:
> Add Xen version printout to 'h' keyhandler output.
> 
> That is useful for debugging systems that have been left intact for a long
> time.
> 
> Signed-off-by: Denis Mukhin 
> Reviewed-by: Jan Beulich 

Hmm, wait - there's yet another issue:

> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -129,6 +129,10 @@ static void cf_check show_handlers(unsigned char key)
>  unsigned int i;
>  
>  printk("'%c' pressed -> showing installed handlers\n", key);
> +
> +print_version();
> +print_build_id();

Here and in console_init_preirq() you expect to be able to call the two
functions, no matter what the tool chain. Then ...

> --- a/xen/common/version.c
> +++ b/xen/common/version.c
> @@ -210,9 +210,28 @@ void __init xen_build_init(void)
>  }
>  }
>  #endif /* CONFIG_X86 */
> -if ( !rc )
> -printk(XENLOG_INFO "build-id: %*phN\n", build_id_len, build_id_p);
>  }
> +
> +void print_version(void)
> +{
> +printk("Xen version %d.%d%s (%s@%s) (%s) %s %s\n",
> +   xen_major_version(), xen_minor_version(), xen_extra_version(),
> +   xen_compile_by(), xen_compile_domain(), xen_compiler(),
> +   xen_build_info(), xen_compile_date());
> +
> +printk("Latest ChangeSet: %s\n", xen_changeset());
> +}
> +
> +void print_build_id(void)
> +{
> +/*
> + * NB: build_id_len may be 0 if XEN_HAS_BUILD_ID=n.
> + * Do not print empty build-id.
> + */
> +if ( build_id_len )
> +printk("build-id: %*phN\n", build_id_len, build_id_p);
> +}
> +
>  #endif /* BUILD_ID */

... their definitions cannot be inside an #ifdef. They want to move up:
- print_build_id() between xen_build_id() and the #ifdef BUILD_ID,
- print_version() yet higher up, perhaps after xen_build_info().
I guess I can do so while committing.

Jan



Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Andrew Cooper
On 19/02/2025 1:19 pm, Oleksandr Andrushchenko wrote:
>
>
> On 19.02.25 14:51, Oleksandr Andrushchenko wrote:
>> Hello, Andrew!
>>
>> On 19.02.25 14:49, Andrew Cooper wrote:
>>> On 16/02/2025 5:11 pm, Oleksandr Andrushchenko wrote:
 Hello, Roger!

 Please find the branch with all the conversions [1].
 Unfortunately I cannot provide a branch as seen with
 diff --ignore-all-space as such a patch will not simply apply.

 Stay safe,
 Oleksandr Andrushchenko

 On 16.02.25 13:58, Andrew Cooper wrote:
> On 16/02/2025 10:21 am, Oleksandr Andrushchenko wrote:
>> There are two diff files which show what happens in case the same is
>> applied to the whole xen/drivers directory:
>> - first one is the result of the "git diff" command, 1.2M [3]
>> - the second one is for "git diff --ignire-all-space", 600K [4]
> Please can you format everything, and put it on a branch
> somewhere, so
> people can browse.
>
> ~Andrew
 [1] https://github.com/andr2000/xen/tree/clang_ml_drivers_v002_diff
> force pushed to the same branch for all Xen

Thankyou.

~Andrew



[PATCH v2 1/2] xen/list: avoid UB in list iterators

2025-02-19 Thread Juergen Gross
The list_for_each_entry*() iterators are testing for having reached the
end of the list in a way which relies on undefined behavior: the
iterator (being a pointer to the struct of a list element) is advanced
and only then tested to have reached not the next element, but the list
head. This results in the list head being addressed via a list element
pointer, which is undefined, in case the list elements have a higher
alignment than the list head.

Avoid that by testing for the end of the list before advancing the
iterator. In case of having reached the end of the list, set the
iterator to NULL and use that for stopping the loop. This has the
additional advantage of not leaking the iterator pointing to something
which isn't a list element past the loop.

There is one case in the Xen code where the iterator is used after
the loop: it is tested to be non-NULL to indicate the loop has run
until reaching the end of the list. This case is modified to use the
iterator being NULL for indicating the end of the list has been
reached.

Reported-by: Andrew Cooper 
Signed-off-by: Juergen Gross 
Release-Acked-by: Oleksii Kurochko 
---
No proper Fixes: tag, as this bug predates Xen's git and mercurial
history.
V2:
- fix one use case (Jan Beulich)
- let list_is_first() return bool, rename parameter (Jan Beulich)
- paranthesize iterator when used as non-NULL test (Jan Beulich)
- avoid dereferencing NULL in the safe iterators (Jan Beulich)
---
 xen/drivers/passthrough/x86/hvm.c |   3 +-
 xen/include/xen/list.h| 110 +++---
 2 files changed, 73 insertions(+), 40 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index f5faff7a49..213dd60340 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -639,12 +639,11 @@ int pt_irq_destroy_bind(
 {
 list_del(&girq->list);
 xfree(girq);
-girq = NULL;
 break;
 }
 }
 
-if ( girq )
+if ( !girq )
 {
 write_unlock(&d->event_lock);
 return -EINVAL;
diff --git a/xen/include/xen/list.h b/xen/include/xen/list.h
index 62169f4674..d90b3f6f0d 100644
--- a/xen/include/xen/list.h
+++ b/xen/include/xen/list.h
@@ -291,6 +291,17 @@ static inline void list_move_tail(struct list_head *list,
 list_add_tail(list, head);
 }
 
+/**
+ * list_is_first - tests whether @entry is the first entry in list @head
+ * @entry: the entry to test
+ * @head: the head of the list
+ */
+static inline bool list_is_first(const struct list_head *entry,
+ const struct list_head *head)
+{
+return entry->prev == head;
+}
+
 /**
  * list_is_last - tests whether @list is the last entry in list @head
  * @list: the entry to test
@@ -440,7 +451,19 @@ static inline void list_splice_init(struct list_head *list,
   */
 #define list_next_entry(pos, member) \
 list_entry((pos)->member.next, typeof(*(pos)), member)
- 
+
+/**
+  * list_next_entry_or_null - get the next element in list
+  * @pos:the type * to cursor
+  * @member: the name of the struct list_head within the struct.
+  *
+  * Note that if the end of the list is reached, it returns NULL.
+  */
+#define list_next_entry_or_null(head, pos, member) \
+((!(pos) || list_is_last(&(pos)->member, head))\
+ ? NULL\
+ : list_entry((pos)->member.next, typeof(*(pos)), member))
+
 /**
   * list_prev_entry - get the prev element in list
   * @pos:the type * to cursor
@@ -449,6 +472,18 @@ static inline void list_splice_init(struct list_head *list,
 #define list_prev_entry(pos, member) \
 list_entry((pos)->member.prev, typeof(*(pos)), member)
 
+/**
+  * list_prev_entry_or_null - get the prev element in list
+  * @pos:the type * to cursor
+  * @member: the name of the struct list_head within the struct.
+  *
+  * Note that if the start of the list is reached, it returns NULL.
+  */
+#define list_prev_entry_or_null(head, pos, member) \
+((!(pos) || list_is_first(&(pos)->member, head))   \
+ ? NULL\
+ : list_entry((pos)->member.prev, typeof(*(pos)), member))
+
 /**
  * list_for_each-iterate over a list
  * @pos:the &struct list_head to use as a loop cursor.
@@ -492,10 +527,10 @@ static inline void list_splice_init(struct list_head 
*list,
  * @head:   the head for your list.
  * @member: the name of the list_struct within the struct.
  */
-#define list_for_each_entry(pos, head, member)  \
-for ((pos) = list_entry((head)->next, typeof(*(pos)), member);  \
- &(pos)->member != (head);  \
- (pos) = list_entry((pos)->member.next, typeof(*(pos)), member))
+#define 

[PATCH v2 2/2] xen/list: fix comments in include/xen/list.h

2025-02-19 Thread Juergen Gross
There are several places in list.h where "list_struct" is used instead
of "struct list_head". Fix that.

Signed-off-by: Juergen Gross 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 
---
 xen/include/xen/list.h | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/xen/include/xen/list.h b/xen/include/xen/list.h
index d90b3f6f0d..10e6d10522 100644
--- a/xen/include/xen/list.h
+++ b/xen/include/xen/list.h
@@ -395,7 +395,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_entry - get the struct for this entry
  * @ptr:the &struct list_head pointer.
  * @type:the type of the struct this is embedded in.
- * @member:the name of the list_struct within the struct.
+ * @member:the name of the struct list_head within the struct.
  */
 #define list_entry(ptr, type, member) \
 container_of(ptr, type, member)
@@ -404,7 +404,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_first_entry - get the first element from a list
  * @ptr:the list head to take the element from.
  * @type:   the type of the struct this is embedded in.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  *
  * Note, that list is expected to be not empty.
  */
@@ -415,7 +415,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_last_entry - get the last element from a list
  * @ptr:the list head to take the element from.
  * @type:   the type of the struct this is embedded in.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  *
  * Note, that list is expected to be not empty.
  */
@@ -426,7 +426,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_first_entry_or_null - get the first element from a list
  * @ptr:the list head to take the element from.
  * @type:   the type of the struct this is embedded in.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  *
  * Note that if the list is empty, it returns NULL.
  */
@@ -437,7 +437,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_last_entry_or_null - get the last element from a list
  * @ptr:the list head to take the element from.
  * @type:   the type of the struct this is embedded in.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  *
  * Note that if the list is empty, it returns NULL.
  */
@@ -447,7 +447,7 @@ static inline void list_splice_init(struct list_head *list,
 /**
   * list_next_entry - get the next element in list
   * @pos:the type * to cursor
-  * @member: the name of the list_struct within the struct.
+  * @member: the name of the struct list_head within the struct.
   */
 #define list_next_entry(pos, member) \
 list_entry((pos)->member.next, typeof(*(pos)), member)
@@ -467,7 +467,7 @@ static inline void list_splice_init(struct list_head *list,
 /**
   * list_prev_entry - get the prev element in list
   * @pos:the type * to cursor
-  * @member: the name of the list_struct within the struct.
+  * @member: the name of the struct list_head within the struct.
   */
 #define list_prev_entry(pos, member) \
 list_entry((pos)->member.prev, typeof(*(pos)), member)
@@ -525,7 +525,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_for_each_entry - iterate over list of given type
  * @pos:the type * to use as a loop cursor.
  * @head:   the head for your list.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  */
 #define list_for_each_entry(pos, head, member)\
 for ( (pos) = list_first_entry_or_null(head, typeof(*(pos)), member); \
@@ -536,7 +536,7 @@ static inline void list_splice_init(struct list_head *list,
  * list_for_each_entry_reverse - iterate backwards over list of given type.
  * @pos:the type * to use as a loop cursor.
  * @head:   the head for your list.
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  */
 #define list_for_each_entry_reverse(pos, head, member)   \
 for ( (pos) = list_last_entry_or_null(head, typeof(*(pos)), member); \
@@ -548,7 +548,7 @@ static inline void list_splice_init(struct list_head *list,
  *  list_for_each_entry_continue
  * @pos:the type * to use as a start point
  * @head:   the head of the list
- * @member: the name of the list_struct within the struct.
+ * @member: the name of the struct list_head within the struct.
  *
  * Prepares a pos entry for use as a start point in
  * list_f

[PATCH v2 0/2] xen/list: some fixes in list.h

2025-02-19 Thread Juergen Gross
Patch 1 is a fix for an undefined behavior reported by Andrew. I think
this patch should be considered for 4.20.

Patch 2 is fixing wrong comments in list.h I stumbled over when doing
patch 1. As it is absolutely no risk involved with this patch, I think
it should be 4.20 material, too.

Changes in V2:
- comments addressed

Juergen Gross (2):
  xen/list: avoid UB in list iterators
  xen/list: fix comments in include/xen/list.h

 xen/drivers/passthrough/x86/hvm.c |   3 +-
 xen/include/xen/list.h| 144 ++
 2 files changed, 90 insertions(+), 57 deletions(-)

-- 
2.43.0




Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Andrew Cooper
On 16/02/2025 5:11 pm, Oleksandr Andrushchenko wrote:
> Hello, Roger!
>
> Please find the branch with all the conversions [1].
> Unfortunately I cannot provide a branch as seen with
> diff --ignore-all-space as such a patch will not simply apply.
>
> Stay safe,
> Oleksandr Andrushchenko
>
> On 16.02.25 13:58, Andrew Cooper wrote:
>> On 16/02/2025 10:21 am, Oleksandr Andrushchenko wrote:
>>> There are two diff files which show what happens in case the same is
>>> applied to the whole xen/drivers directory:
>>> - first one is the result of the "git diff" command, 1.2M [3]
>>> - the second one is for "git diff --ignire-all-space", 600K [4]
>> Please can you format everything, and put it on a branch somewhere, so
>> people can browse.
>>
>> ~Andrew
> [1] https://github.com/andr2000/xen/tree/clang_ml_drivers_v002_diff

That appears to only be drivers/

Please do *everything*.  I want to see what this does to files I
consider to be pretty clean Xen-style.

~Andrew




Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Jan Beulich
On 19.02.2025 13:43, Oleksandr Andrushchenko wrote:
> Hello, Jan, Stefano!
> 
> On 18.02.25 13:34, Jan Beulich wrote:
>> On 18.02.2025 03:36, Stefano Stabellini wrote:
>>> On Mon, 17 Feb 2025, Jan Beulich wrote:
 On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:
> 1. Const string arrays reformatting
> In case the length of items change we might need to introduce a bigger
> change wrt new formatting of unaffected lines
> ==
>
> --- a/xen/drivers/acpi/tables.c
> +++ b/xen/drivers/acpi/tables.c
> @@ -38,10 +38,10 @@
> -static const char *__initdata
> -mps_inti_flags_polarity[] = { "dfl", "high", "res", "low" };
> -static const char *__initdata
> -mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
> +static const char *__initdata mps_inti_flags_polarity[] = { "dfl", 
> "high",
> +"res", "low" 
> };
> +static const char *__initdata mps_inti_flags_trigger[] = { "dfl", 
> "edge", "res",
>
> --- a/xen/drivers/acpi/utilities/utglobal.c
> +++ b/xen/drivers/acpi/utilities/utglobal.c
>   static const char *const 
> acpi_gbl_region_types[ACPI_NUM_PREDEFINED_REGIONS] = {
> - "SystemMemory",
> - "SystemIO",
> - "PCI_Config",
> - "EmbeddedControl",
> - "SMBus",
> - "CMOS",
> - "PCIBARTarget",
> - "DataTable"
> +"SystemMemory", "SystemIO", "PCI_Config",   "EmbeddedControl",
> +"SMBus","CMOS", "PCIBARTarget", "DataTable"
>   };
 Why in the world would a tool need to touch anything like the two examples
 above? My take is that the code is worse readability-wise afterwards.
>>> I think the output is acceptable: not necessarily better than before,
>>> but also not significantly worse.
>> Hmm, for the change to xen/drivers/acpi/tables.c I wouldn't agree with this
>> statement. And for xen/drivers/acpi/utilities/utglobal.c remember that this
>> is code taken from ACPI CA, which we may better not re-format.
> We can use /* clang-format off */ constructs to protect those lines we
> do not want to be touched by clang-format [1]. This is what Grygprii
> mentioned in some other e-mail.

We have fall-through comments. We have SAF comments. Yet another flavor to
keep some external tool happy. If everyone else thinks this is a good idea,
I'm not intending to stand in the way. Yet I don't like this as a workaround.
Instead I think the tool's going too far.

Jan



Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Grygorii!

On 18.02.25 11:56, Grygorii Strashko wrote:



On 16.02.25 12:21, Oleksandr Andrushchenko wrote:

Hello, everybody!

As agreed before [1] I am sending a series to show two samples of the
formatting done with clang-format. The clang-format configuration can be
found at [2]. It already has some notes on previous discussions when
Luca presented his version of that configuration and which I used to
start from.

There are two diff files which show what happens in case the same is
applied to the whole xen/drivers directory:
- first one is the result of the "git diff" command, 1.2M [3]
- the second one is for "git diff --ignire-all-space", 600K [4]

This is not to limit the reviewers from seeing a bigger picture and not
only the files in this series.
N.B. xen/drivers uses tabs a lot, so this is no surprise the size
difference is big enough: roughly space conversion is half of the changes.

While reviewing the changes I have collected some of the unexpected things
done by clang-format and some interesting pieces. You can find those
below. For some of those I have filed an issue on clang-format and hope the
community will lead me in resolving those. Of course what I collected is
not the complete list of such changes, so I hope we can discuss missed
ones here.

 From this exercise I can definitely tell that clang-format does help a
lot and has potential to be employed as a code formatting tool for Xen.
Of course it cannot be used as is now and will require discussions on the
Xen coding style and possibly submitting patches to clang-format to
satisfy those which cannot be handled by the tool now.

Stay safe,
Oleksandr Andrushchenko

1. Const string arrays reformatting
In case the length of items change we might need to introduce a bigger
change wrt new formatting of unaffected lines
==

--- a/xen/drivers/acpi/tables.c
+++ b/xen/drivers/acpi/tables.c
@@ -38,10 +38,10 @@
-static const char *__initdata
-mps_inti_flags_polarity[] = { "dfl", "high", "res", "low" };
-static const char *__initdata
-mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
+static const char *__initdata mps_inti_flags_polarity[] = { "dfl", "high",
+ "res", "low" };
+static const char *__initdata mps_inti_flags_trigger[] = { "dfl", "edge", 
"res",

--- a/xen/drivers/acpi/utilities/utglobal.c
+++ b/xen/drivers/acpi/utilities/utglobal.c
  static const char *const acpi_gbl_region_types[ACPI_NUM_PREDEFINED_REGIONS] = 
{
-    "SystemMemory",
-    "SystemIO",
-    "PCI_Config",
-    "EmbeddedControl",
-    "SMBus",
-    "CMOS",
-    "PCIBARTarget",
-    "DataTable"
+    "SystemMemory", "SystemIO", "PCI_Config", "EmbeddedControl",
+    "SMBus",    "CMOS", "PCIBARTarget", "DataTable"
  };

2. Long strings in ptintk violations
I filed an issue on printk format strings [5]
==
@@ -225,9 +231,9 @@ void __init acpi_table_print_madt_entry(struct 
acpi_subtable_header *header)
  printk(KERN_DEBUG PREFIX
-   "GIC Distributor (gic_id[0x%04x] address[0x%"PRIx64"] 
gsi_base[%d])\n",
-   p->gic_id, p->base_address,
-   p->global_irq_base);
+   "GIC Distributor (gic_id[0x%04x] address[0x%" PRIx64
+   "] gsi_base[%d])\n",
+   p->gic_id, p->base_address, p->global_irq_base);

@@ -629,12 +628,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
-   printk(XENLOG_ERR VTDPREFIX
-  "Overlapping RMRRs [%"PRIx64",%"PRIx64"] and 
[%"PRIx64",%"PRIx64"]\n",
-  rmrru->base_address, rmrru->end_address,
-  base_addr, end_addr);
+    printk(XENLOG_ERR VTDPREFIX "Overlapping RMRRs [%" PRIx64
+    ",%" PRIx64 "] and [%" PRIx64
+    ",%" PRIx64 "]\n",
+   rmrru->base_address, rmrru->end_address, base_addr,
+   end_addr);


It doesn't understand properly things like "PRIx64" :(

Most of other items, I've mentioned already,

Unhappy: it's probably "known" things - identification of complex defines and
static struct/arrays declarations. 


And for them, probably, no magic automation tools exist which will satisfy 
everybody as,
at least static definitions are usually formatted to achieve better readability
which is always subjective.

Strongly agree


The tags /* clang-format off/clang-format on */ might be useful.


Yes, I guess we will be forced to use those

[..]

Honestly, It looks a bit strange that Xen community is considering batch 
automated code formatting,
For example Linux kernel cleanly rejected such approach.
Linux kernel docs "4.1.1. Coding style" section [1].

Another thing, checking the new code and accepting code style violations (if 
any) on per-patch basis.

The main difference, and it was brought by Jan before [1], the poss

Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Andrew!

On 19.02.25 14:49, Andrew Cooper wrote:

On 16/02/2025 5:11 pm, Oleksandr Andrushchenko wrote:

Hello, Roger!

Please find the branch with all the conversions [1].
Unfortunately I cannot provide a branch as seen with
diff --ignore-all-space as such a patch will not simply apply.

Stay safe,
Oleksandr Andrushchenko

On 16.02.25 13:58, Andrew Cooper wrote:

On 16/02/2025 10:21 am, Oleksandr Andrushchenko wrote:

There are two diff files which show what happens in case the same is
applied to the whole xen/drivers directory:
- first one is the result of the "git diff" command, 1.2M [3]
- the second one is for "git diff --ignire-all-space", 600K [4]

Please can you format everything, and put it on a branch somewhere, so
people can browse.

~Andrew

[1] https://github.com/andr2000/xen/tree/clang_ml_drivers_v002_diff

That appears to only be drivers/

I thought that was the agreement, so we start from the drivers first


Please do *everything*.  I want to see what this does to files I
consider to be pretty clean Xen-style.

Sure


~Andrew


Thank you,
Oleksandr



Re: [XEN RFC PATCH v6 02/11] docs/designs: Add a design document for PV-IOMMU

2025-02-19 Thread Frediano Ziglio

On 17/02/2025 10:18, Teddy Astie wrote:

Some operating systems want to use IOMMU to implement various features (e.g
VFIO) or DMA protection.
This patch introduce a proposal for IOMMU paravirtualization for Dom0.

Signed-off-by: Teddy Astie 
---
  docs/designs/pv-iommu.md | 116 +++
  1 file changed, 116 insertions(+)
  create mode 100644 docs/designs/pv-iommu.md

diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
new file mode 100644
index 00..7df9fa0b94
--- /dev/null
+++ b/docs/designs/pv-iommu.md
@@ -0,0 +1,116 @@
+# IOMMU paravirtualization for Dom0
+
+Status: Experimental
+
+# Background
+
+By default, Xen only uses the IOMMU for itself, either to make device adress
+space coherent with guest adress space (x86 HVM/PVH) or to prevent devices


typo: adress -> address


+from doing DMA outside it's expected memory regions including the hypervisor
+(x86 PV).
+
+A limitation is that guests (especially privildged ones) may want to use


typo: privildged -> privileged


+IOMMU hardware in order to implement features such as DMA protection and
+VFIO [1] as IOMMU functionality is not available outside of the hypervisor
+currently.
+
+[1] VFIO - "Virtual Function I/O" - 
https://www.kernel.org/doc/html/latest/driver-api/vfio.html
+
+# Design
+
+The operating system may want to have access to various IOMMU features such as
+context management and DMA remapping. We can create a new hypercall that allows
+the guest to have access to a new paravirtualized IOMMU interface.
+
+This feature is only meant to be available for the Dom0, as DomU have some
+emulated devices that can't be managed on Xen side and are not hardware, we
+can't rely on the hardware IOMMU to enforce DMA remapping.
+
+This interface is exposed under the `iommu_op` hypercall.
+
+In addition, Xen domains are modified in order to allow existence of several
+IOMMU context including a default one that implement default behavior (e.g
+hardware assisted paging) and can't be modified by guest. DomU cannot have
+contexts, and therefore act as if they only have the default domain.
+
+Each IOMMU context within a Xen domain is identified using a domain-specific
+context number that is used in the Xen IOMMU subsystem and the hypercall
+interface.
+
+The number of IOMMU context a domain is specified by either the toolstack or
+the domain itself.


I don't understand what you want express with the above sentence.
Maybe it's just me.


+
+# IOMMU operations
+
+## Initialize PV-IOMMU
+
+Initialize PV-IOMMU for the domain.
+It can only be called once.
+


Could this operation be done automatically on first context allocation ?


+## Alloc context
+
+Create a new IOMMU context for the guest and return the context number to the
+guest.
+Fail if the IOMMU context limit of the guest is reached.
+
+A flag can be specified to create a identity mapping.
+
+## Free context
+
+Destroy a IOMMU context created previously.
+It is not possible to free the default context.
+
+Reattach context devices to default context if specified by the guest.
+
+Fail if there is a device in the context and reattach-to-default flag is not
+specified.
+
+## Reattach device
+
+Reattach a device to another IOMMU context (including the default one).
+The target IOMMU context number must be valid and the context allocated.
+
+The guest needs to specify a PCI SBDF of a device he has access to.
+
+## Map/unmap page
+
+Map/unmap a page on a context.
+The guest needs to specify a gfn and target dfn to map.
+
+Refuse to create the mapping if one already exist for the same dfn.
+
+## Lookup page
+
+Get the gfn mapped by a specific dfn.
+
+## Remote command
+
+Make a PV-IOMMU operation on behalf of another domain.
+Especially useful for implementing IOMMU emulation (e.g using QEMU)
+or initializing PV-IOMMU with enforced limits.
+
+# Implementation considerations
+
+## Hypercall batching
+
+In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
+be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
+


I suppose that batching also implies preemption.


+## Hardware without IOMMU support
+
+Operating system needs to be aware on PV-IOMMU capability, and whether it is
+able to make contexts. However, some operating system may critically fail in
+case they are able to make a new IOMMU context. Which is supposed to happen
+if no IOMMU hardware is available.
+
+The hypercall interface needs a interface to advertise the ability to create
+and manage IOMMU contexts including the amount of context the guest is able
+to use. Using these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+


Regards,
  Frediano




Re: [XEN RFC PATCH v6 07/11] iommu: Simplify hardware did management

2025-02-19 Thread Frediano Ziglio

On 17/02/2025 10:18, Teddy Astie wrote:

Simplify the hardware DID management by allocating a DID
per IOMMU context (currently Xen domain) instead of trying
to reuse Xen domain DID (which may not be possible depending
on hardware constraints like did limits).


Minor: here and in the title, did should be DID if it's the acronym, 
otherwise can be confusing.


...

Frediano




Re: [XEN RFC PATCH v6 07/11] iommu: Simplify hardware did management

2025-02-19 Thread Frediano Ziglio

On 17/02/2025 10:18, Teddy Astie wrote:

Simplify the hardware DID management by allocating a DID
per IOMMU context (currently Xen domain) instead of trying
to reuse Xen domain DID (which may not be possible depending
on hardware constraints like did limits).


Minor: here and in the title, did should be DID if it's the acronym, 
otherwise can be confusing.


...

Frediano




Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Jan, Stefano!

On 18.02.25 13:34, Jan Beulich wrote:

On 18.02.2025 03:36, Stefano Stabellini wrote:

On Mon, 17 Feb 2025, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

1. Const string arrays reformatting
In case the length of items change we might need to introduce a bigger
change wrt new formatting of unaffected lines
==

--- a/xen/drivers/acpi/tables.c
+++ b/xen/drivers/acpi/tables.c
@@ -38,10 +38,10 @@
-static const char *__initdata
-mps_inti_flags_polarity[] = { "dfl", "high", "res", "low" };
-static const char *__initdata
-mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
+static const char *__initdata mps_inti_flags_polarity[] = { "dfl", "high",
+"res", "low" };
+static const char *__initdata mps_inti_flags_trigger[] = { "dfl", "edge", 
"res",

--- a/xen/drivers/acpi/utilities/utglobal.c
+++ b/xen/drivers/acpi/utilities/utglobal.c
  static const char *const acpi_gbl_region_types[ACPI_NUM_PREDEFINED_REGIONS] = 
{
-   "SystemMemory",
-   "SystemIO",
-   "PCI_Config",
-   "EmbeddedControl",
-   "SMBus",
-   "CMOS",
-   "PCIBARTarget",
-   "DataTable"
+"SystemMemory", "SystemIO", "PCI_Config",   "EmbeddedControl",
+"SMBus","CMOS", "PCIBARTarget", "DataTable"
  };

Why in the world would a tool need to touch anything like the two examples
above? My take is that the code is worse readability-wise afterwards.

I think the output is acceptable: not necessarily better than before,
but also not significantly worse.

Hmm, for the change to xen/drivers/acpi/tables.c I wouldn't agree with this
statement. And for xen/drivers/acpi/utilities/utglobal.c remember that this
is code taken from ACPI CA, which we may better not re-format.

We can use /* clang-format off */ constructs to protect those lines we
do not want to be touched by clang-format [1]. This is what Grygprii
mentioned in some other e-mail.



To me, the main takeaway is that there
are many unavoidable but unnecessary changes.

Interesting. I'd say it slightly differently: The main takeaway is that
there are many avoidable / unnecessary changes.

But at the end of the day it will allow using automatic formatting...


Jan

Thank you,
Oleksandr
[1] 
https://clang.llvm.org/docs/ClangFormatStyleOptions.html#disabling-formatting-on-a-piece-of-code



Re: [PATCH v3 1/2] xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

2025-02-19 Thread Jan Beulich
On 18.02.2025 10:51, Luca Fancellu wrote:
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -110,6 +110,8 @@ extern int8_t iommu_hwdom_reserved;
>  
>  extern unsigned int iommu_dev_iotlb_timeout;
>  
> +#ifdef CONFIG_HAS_PASSTHROUGH
> +
>  int iommu_setup(void);
>  int iommu_hardware_setup(void);
>  
> @@ -122,6 +124,28 @@ int arch_iommu_domain_init(struct domain *d);
>  void arch_iommu_check_autotranslated_hwdom(struct domain *d);
>  void arch_iommu_hwdom_init(struct domain *d);
>  
> +#else
> +
> +static inline int iommu_setup(void)
> +{
> +return -ENODEV;
> +}
> +
> +static inline int iommu_domain_init(struct domain *d, unsigned int opts)
> +{
> +/*
> + * When !HAS_PASSTHROUGH, iommu_enabled is set to false and the expected
> + * behaviour of this function is to return success in that case.
> + */
> +return 0;
> +}

Hmm. Would the function be anywhere near likely to do anything else than
what it's expected to do? My original concern here was with "opts"
perhaps asking for something that cannot be supported. But that was wrong
thinking on my part. Here what you do is effectively open-code what the
real iommu_domain_init() would do: Return success when !is_iommu_enabled().
Which in turn follows from !iommu_enabled when !HAS_PASSTHROUGH.

On that basis I'd be okay if the comment was dropped again. Else it imo
wants re-wording to get closer to the explanation above.

> @@ -383,12 +429,12 @@ struct domain_iommu {
>  #define iommu_set_feature(d, f)   set_bit(f, dom_iommu(d)->features)
>  #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
>  
> +#ifdef CONFIG_HAS_PASSTHROUGH
>  /* Are we using the domain P2M table as its IOMMU pagetable? */
>  #define iommu_use_hap_pt(d)   (IS_ENABLED(CONFIG_HVM) && \
> dom_iommu(d)->hap_pt_share)
>  
>  /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
> -#ifdef CONFIG_HAS_PASSTHROUGH
>  #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
>  
>  int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,

Coming back to my v2 comment: Why exactly is this change needed here? If
things build fine without the macro being defined when !HAS_PASSTHROUGH,
surely they will also build fine with it being defined? As per the
respective revlog entry, this change looks to belong into whatever is
going to be done to deal with the one Arm use of the macro. Or maybe
it's unneeded altogether.

Jan



Re: [PATCH v3 1/2] xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

2025-02-19 Thread Luca Fancellu


> On 19 Feb 2025, at 12:45, Jan Beulich  wrote:
> 
> On 18.02.2025 10:51, Luca Fancellu wrote:
>> --- a/xen/include/xen/iommu.h
>> +++ b/xen/include/xen/iommu.h
>> @@ -110,6 +110,8 @@ extern int8_t iommu_hwdom_reserved;
>> 
>> extern unsigned int iommu_dev_iotlb_timeout;
>> 
>> +#ifdef CONFIG_HAS_PASSTHROUGH
>> +
>> int iommu_setup(void);
>> int iommu_hardware_setup(void);
>> 
>> @@ -122,6 +124,28 @@ int arch_iommu_domain_init(struct domain *d);
>> void arch_iommu_check_autotranslated_hwdom(struct domain *d);
>> void arch_iommu_hwdom_init(struct domain *d);
>> 
>> +#else
>> +
>> +static inline int iommu_setup(void)
>> +{
>> +return -ENODEV;
>> +}
>> +
>> +static inline int iommu_domain_init(struct domain *d, unsigned int opts)
>> +{
>> +/*
>> + * When !HAS_PASSTHROUGH, iommu_enabled is set to false and the expected
>> + * behaviour of this function is to return success in that case.
>> + */
>> +return 0;
>> +}
> 
> Hmm. Would the function be anywhere near likely to do anything else than
> what it's expected to do? My original concern here was with "opts"
> perhaps asking for something that cannot be supported. But that was wrong
> thinking on my part. Here what you do is effectively open-code what the
> real iommu_domain_init() would do: Return success when !is_iommu_enabled().
> Which in turn follows from !iommu_enabled when !HAS_PASSTHROUGH.
> 
> On that basis I'd be okay if the comment was dropped again. Else it imo
> wants re-wording to get closer to the explanation above.

Would it be ok for you a comment saying:
“This stub returns the same as the real iommu_domain_init()
 function: success when !is_iommu_enabled(), which value is based on 
iommu_enabled
that is false when !HAS_PASSTHROUGH"


> 
>> @@ -383,12 +429,12 @@ struct domain_iommu {
>> #define iommu_set_feature(d, f)   set_bit(f, dom_iommu(d)->features)
>> #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
>> 
>> +#ifdef CONFIG_HAS_PASSTHROUGH
>> /* Are we using the domain P2M table as its IOMMU pagetable? */
>> #define iommu_use_hap_pt(d)   (IS_ENABLED(CONFIG_HVM) && \
>>dom_iommu(d)->hap_pt_share)
>> 
>> /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
>> -#ifdef CONFIG_HAS_PASSTHROUGH
>> #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
>> 
>> int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
> 
> Coming back to my v2 comment: Why exactly is this change needed here? If
> things build fine without the macro being defined when !HAS_PASSTHROUGH,
> surely they will also build fine with it being defined?

I’ve defined an empty stub on an header included only on MPU systems for the
p2m module, this is why it is building

I didn’t modify p2m_set_way_flush() which lives in arm common code, because
it will be used also on MPU systems (R82 has MPU at EL2 but MMU/MPU at EL1)
and I would like to stay the same and be used by MMU/MPU subsystems.

> As per the
> respective revlog entry, this change looks to belong into whatever is
> going to be done to deal with the one Arm use of the macro. Or maybe
> it's unneeded altogether.

I didn’t understand that you were opposing to protecting iommu_use_hap_pt() when
!HAS_PASSTHROUGH, I thought you were referring only to the stub in the #else
branch.
Can I ask why? in any case when !HAS_PASSTHROUGH, this macro is not usable
since dom_iommu() is resolved to a membed that doesn’t exist in the 
configuration,
am I missing something?

Cheers,
Luca



Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Jan!

On 19.02.25 14:39, Oleksandr Andrushchenko wrote:

Hello, Jan!

On 17.02.25 12:20, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -14,7 +14,7 @@
   * abstracted.
   */
  #if defined(CONFIG_HAS_PCI) && defined(CONFIG_X86)
-# define NS16550_PCI
+#define NS16550_PCI
  #endif

Hmm. Either form ought to be okay, so the line would want leaving untouched.

It seems this can actually have 3 forms under IndentPPDirectives control
Please see [1].

I would go with BeforeHash personally

[1] https://clang.llvm.org/docs/ClangFormatStyleOptions.html#indentppdirectives





Re: [PATCH for 4.20? v4 1/3] xen/riscv: implement software page table walking

2025-02-19 Thread Oleksii Kurochko


On 2/19/25 12:14 PM, Jan Beulich wrote:

On 12.02.2025 17:50, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -185,6 +185,68 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, 
unsigned int offset)
  return XEN_TABLE_NORMAL;
  }
  
+/*

+ * _pt_walk() performs software page table walking and returns the pte_t of
+ * a leaf node or the leaf-most not-present pte_t if no leaf node is found
+ * for further analysis.
+ *
+ * Additionally, _pt_walk() returns the level of the found pte by using
+ * `pte_level` argument.
+ * `pte_level` is optional, set `pte_level`=NULL if a caller doesn't need
+ * the level of the found pte.

How about this, reducing redundancy a little?

  * _pt_walk() can optionally return the level of the found pte. Pass NULL
  * for `pte_level` if this information isn't needed.


+pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+pte_t *entry = _pt_walk(va, pte_level);
+pte_t pte = *entry;
+
+unmap_table(entry);
+
+return pte;
+}

"entry" especially in this context is ambiguous. I would expect a variable of
this name to be of type pte_t, not pte_t *. How about "ptep"?


Agree with both your suggestions, it would be better to use `ptep instead of 
`entry`
and rephrase the comment.



Preferably with these adjustments, which I'd be fine making while committing,
Reviewed-by: Jan Beulich

Considering the 4.20? tag you'll need to decide whether you still want this
in before the release.


Considering that it is still needed a new version for patch3 of this patch 
series and
that the mentioned issues aren't affected no one, lets consider the full patch 
series for
4.21.

Thanks.

~ Oleksii


Re: [PATCH for 4.20? v4 3/3] xen/riscv: update mfn calculation in pt_mapping_level()

2025-02-19 Thread Oleksii Kurochko


On 2/19/25 12:28 PM, Jan Beulich wrote:

On 12.02.2025 17:50, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -249,12 +249,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
  
  /* Update an entry at the level @target. */

  static int pt_update_entry(mfn_t root, vaddr_t virt,
-   mfn_t mfn, unsigned int target,
+   mfn_t mfn, unsigned int *target,
 unsigned int flags)
  {
  int rc;
-unsigned int level = HYP_PT_ROOT_LEVEL;
-pte_t *table;
  /*
   * The intermediate page table shouldn't be allocated when MFN isn't
   * valid and we are not populating page table.
@@ -265,41 +263,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
   * combinations of (mfn, flags).
  */
  bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
-pte_t pte, *entry;
-
-/* convenience aliases */
-DECLARE_OFFSETS(offsets, virt);
+pte_t pte, *entry = NULL;

With there also being "table" below, "entry" isn't quite as bad as in the
other patch. Yet I'd still like to ask that you consider renaming.


-table = map_table(root);
-for ( ; level > target; level-- )
+if ( *target == CONFIG_PAGING_LEVELS )
+entry = _pt_walk(virt, target);

Imo it's quite important for the comment ahead of the function to be updated
to mention this special case.


+else
  {
-rc = pt_next_level(alloc_tbl, &table, offsets[level]);
-if ( rc == XEN_TABLE_MAP_NOMEM )
+pte_t *table;
+unsigned int level = HYP_PT_ROOT_LEVEL;
+/* convenience aliases */

Nit: Style.


From the 'Comments' section of CODING_STYLE, I see that the comment should start
with capital letter. Do you mean that?




@@ -331,7 +336,8 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
  rc = 0;
  
   out:

-unmap_table(table);
+if ( entry )
+unmap_table(entry);

Would it perhaps be worth for unmap_table() to gracefully handle being passed
NULL, to avoid such conditionals (there may be more in the future)?


Agree, it would be more safe to move this check inside unmap_table(). I will 
update
that.

Thanks.

~ Oleksii


Re: [PATCH for 4.21 v6 1/2] xen/riscv: drop CONFIG_RISCV_ISA_RV64G

2025-02-19 Thread Oleksii Kurochko


On 2/18/25 6:03 PM, Jan Beulich wrote:

On 12.02.2025 17:50, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/Kconfig
+++ b/xen/arch/riscv/Kconfig
@@ -28,16 +28,6 @@ choice
help
  This selects the base ISA extensions that Xen will target.
  
-config RISCV_ISA_RV64G

-   bool "RV64G"
-   help
- Use the RV64I base ISA, plus
- "M" for multiply/divide,
- "A" for atomic instructions,
- “F”/"D" for  {single/double}-precision floating-point instructions,
- "Zicsr" for control and status register access,
- "Zifencei" for instruction-fetch fence.
-
  endchoice

Shouldn't the choice be removed altogether then, for now being empty?


Overlooked that, "Base ISA" choice could be removed too then. or just change to:
choice
prompt "Base ISA"
default "ima" if RISCV_64
help
  This selects the base ISA extensions that Xen will target.

endchoice




--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -6,8 +6,13 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
  riscv-abi-$(CONFIG_RISCV_32) := -mabi=ilp32
  riscv-abi-$(CONFIG_RISCV_64) := -mabi=lp64
  
-riscv-march-$(CONFIG_RISCV_ISA_RV64G) := rv64g

-riscv-march-$(CONFIG_RISCV_ISA_C)   := $(riscv-march-y)c
+riscv-march-$(CONFIG_RISCV_64) := rv64
+
+riscv-march-y := $(riscv-march-y)ima
+
+riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c
+
+riscv-march-y := $(riscv-march-y)_zicsr_zifencei

The repeated use of := makes this longer than necessary, and hence harder to
read. I understand using += isn't exactly ideal either, because then on the rhs
no blanks may appear (aiui), being kind of against our style and potentially
hampering readability. Still maybe:

riscv-march-$(CONFIG_RISCV_64) := rv64
riscv-march-y+=ima
riscv-march-$(CONFIG_RISCV_ISA_C)+=c
riscv-march-y+=_zicsr_zifencei

?


From my point of view both options hard to read but `+=`, at the moment, look a
little bit better. I will update correspondingly.

Thanks.

~ Oleksii


Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Jan Beulich
On 19.02.2025 15:23, Oleksandr Andrushchenko wrote:
> On 19.02.25 14:39, Oleksandr Andrushchenko wrote:
>> On 17.02.25 12:20, Jan Beulich wrote:
>>> On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:
 --- a/xen/drivers/char/ns16550.c
 +++ b/xen/drivers/char/ns16550.c
 @@ -14,7 +14,7 @@
    * abstracted.
    */
   #if defined(CONFIG_HAS_PCI) && defined(CONFIG_X86)
 -# define NS16550_PCI
 +#define NS16550_PCI
   #endif
>>> Hmm. Either form ought to be okay, so the line would want leaving untouched.
> It seems this can actually have 3 forms under IndentPPDirectives control
> Please see [1].

I saw that. None of the three variants matches the original above, i.e.
here we're again observing a change just for the sake of making a change.
The style used is conforming, after all.

Jan

> I would go with BeforeHash personally
> 
> [1] 
> https://clang.llvm.org/docs/ClangFormatStyleOptions.html#indentppdirectives
> 
> 




Re: [PATCH for 4.21 v6 1/2] xen/riscv: drop CONFIG_RISCV_ISA_RV64G

2025-02-19 Thread Jan Beulich
On 19.02.2025 15:55, Oleksii Kurochko wrote:
> On 2/18/25 6:03 PM, Jan Beulich wrote:
>> On 12.02.2025 17:50, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/Kconfig
>>> +++ b/xen/arch/riscv/Kconfig
>>> @@ -28,16 +28,6 @@ choice
>>> help
>>>   This selects the base ISA extensions that Xen will target.
>>>   
>>> -config RISCV_ISA_RV64G
>>> -   bool "RV64G"
>>> -   help
>>> - Use the RV64I base ISA, plus
>>> - "M" for multiply/divide,
>>> - "A" for atomic instructions,
>>> - “F”/"D" for  {single/double}-precision floating-point instructions,
>>> - "Zicsr" for control and status register access,
>>> - "Zifencei" for instruction-fetch fence.
>>> -
>>>   endchoice
>> Shouldn't the choice be removed altogether then, for now being empty?
> 
> Overlooked that, "Base ISA" choice could be removed too then. or just change 
> to:
> choice
>   prompt "Base ISA"
>   default "ima" if RISCV_64
>   help
> This selects the base ISA extensions that Xen will target.
> 
> endchoice

Besides me wondering what use that would be (there's no variable to store
the "ima" into), I kind of suspect kconfig might choke on the construct.
Plus even if there was some variable, I'd then ask where it is used. There
isn't a lot of sense in having a Kconfig setting with no user.

Jan



Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Jan!

On 19.02.25 15:18, Jan Beulich wrote:

On 19.02.2025 13:39, Oleksandr Andrushchenko wrote:

On 17.02.25 12:20, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

@@ -43,12 +43,12 @@
   
   static struct ns16550 {

   int baud, clock_hz, data_bits, parity, stop_bits, fifo_size, irq;
-u64 io_base;   /* I/O port or memory-mapped I/O address. */
+u64 io_base; /* I/O port or memory-mapped I/O address. */
   u64 io_size;
   int reg_shift; /* Bits to shift register offset by */

Breaking alignment between comments (also in the immediately following hunk),
while at the same time ...

This one...

   int reg_width; /* Size of access to use, the registers
   * themselves are still bytes */

... not taking care of the comment style violation here?

This is controlled by ReflowComments option [3]:

  From the tool point of view the comment is formatted correctly
I didn't find a way to convert such comments into
/*
   * Size of access to use, the registers * themselves are still bytes */ if 
this is what you mean.

Above you see what I received. I can't really deduce from this what
formatting you suggested. In the case at hand, though, I think it's not
the best idea anyway to put a multi-line comment past a declaration (or
statement).

Sorry, for the formatting

/*
 * Size of access to use, the registers
 * themselves are still bytes
 */

This is what I was trying to send

int reg_width;

is what I think would be better in such a case (artificially keeping
this to be multi-line, even if it looks as if it might fit on just one
line then).


@@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct serial_port 
*port)
   if ( ns16550_ioport_invalid(uart) )
   return -EIO;
   
-return ( (ns_read_reg(uart, UART_LSR) &

-  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 0;
+return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
+   ? uart->fifo_size
+   : 0;

Indentation of the ? and : lines is clearly wrong here? What is the tool
doing?

There are number of options that have influence on this formatting:
AllowShortBlocksOnASingleLine [4]
BreakBeforeTernaryOperators [5]
AlignOperands [6]

I was not able to tweak these options to have the previous form.

Right, sticking to the original form (with just the stray blanks zapped)
would of course be best. Yet again - the tool is doing more transformations
despite there not being any need. If, however, it does so, then one of my
expectations would be that the ? and : are properly indented:

 return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
? uart->fifo_size
: 0;

This only differs from what the tool is doing by the fact it applies
the following rule: *IndentWidth: 4*, e.g. it has indented your construct
by 4 spaces, see [1]. Which, IMO, is acceptable change.


or

 return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
? uart->fifo_size : 0;

or

 return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask
 ? uart->fifo_size
 : 0);

(not going to list more variants which are all okay).

In any event, a fundamental requirement of mine is that such a tool would
only apply adjustments when and where style is actively violated. I.e. in
the case here:

return ((ns_read_reg(uart, UART_LSR) &
 uart->lsr_mask) == uart->lsr_mask) ? uart->fifo_size : 0;

That's not overly neat wrapping, but in line with our style. If the other
form was demanded going forward, I'd be curious how you'd verbally
describe the requirement in ./CODING_STYLE.

I believe this can be stated around the fact that we need to indent,
e.g. apply the same rule as for other constructs already in use



@@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 *uart)
   #ifdef NS16550_PCI
   if ( uart->bar && uart->io_base >= 0x1 )
   {
-pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
-  uart->ps_bdf[2]),
- PCI_COMMAND, PCI_COMMAND_MEMORY);
+pci_conf_write16(
+PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], uart->ps_bdf[2]),
+PCI_COMMAND,
+PCI_COMMAND_MEMORY);
   return;
   }

Hmm, transforming a well-formed block into another well-formed one. No
gain? (Same again further down.)

No, gain from human point of view
But there is a gain that it is now formatted automatically.

See above: I'd first like to see a written, textual description for all these
requirements. After all it needs to be possible for a human to write code
that the tool then wouldn't try to re-arrange. Which in turn requires that
the restrictions / constraints on the layout are spelled out.

Agree, the existing coding style document will require some exten

Re: [PATCH] xen/console: introduce console_{get,put}_domain()

2025-02-19 Thread Jan Beulich
On 18.02.2025 09:31, dm...@proton.me wrote:
> @@ -546,31 +555,23 @@ static void __serial_rx(char c)
>   * getting stuck.
>   */
>  send_global_virq(VIRQ_CONSOLE);
> -break;
> -
> +}
>  #ifdef CONFIG_SBSA_VUART_CONSOLE
> -default:
> -{
> -struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
> -
> -if ( d )
> -{
> -int rc = vpl011_rx_char_xen(d, c);
> -if ( rc )
> -guest_printk(d, XENLOG_G_WARNING
> -"failed to process console input: %d\n", rc);
> -rcu_unlock_domain(d);
> -}
> -
> -break;
> -}
> +else
> +/* Deliver input to the emulated UART. */
> +rc = vpl011_rx_char_xen(d, c);
>  #endif
> -}
>  
>  #ifdef CONFIG_X86
>  if ( pv_shim && pv_console )
>  consoled_guest_tx(c);
>  #endif
> +
> +if ( rc )
> +guest_printk(d, XENLOG_G_WARNING
> +"failed to process console input: %d\n", rc);

Wouldn't this better live ahead of the four shim related lines?

Also please correct the log level specifier here. I realize you only move
what was there before, but I consider i bad practice to move buggy code.
gprintk() already prepends XENLOG_GUEST, so instead of XENLOG_G_WARNING
is should just be XENLOG_WARNING. (Line wrapping is also odd here, at
least according to my taste. But since that's not written down anywhere,
I wouldn't insist on adjusting that as well.)

With both adjustments (provided you agree, of course)
Reviewed-by: Jan Beulich 
Given they're reasonably simple to make, I could once again offer to
adjust while committing (provided an Arm ack also arrives).

Jan



Re: [XEN RFC PATCH v6 02/11] docs/designs: Add a design document for PV-IOMMU

2025-02-19 Thread Teddy Astie
Hello Frediano,

Ok for typos fixes

Le 19/02/2025 à 13:02, Frediano Ziglio a écrit :
> On 17/02/2025 10:18, Teddy Astie wrote:
>> +Each IOMMU context within a Xen domain is identified using a domain-
>> specific
>> +context number that is used in the Xen IOMMU subsystem and the hypercall
>> +interface.
>> +
>> +The number of IOMMU context a domain is specified by either the
>> toolstack or
>> +the domain itself.
>
> I don't understand what you want express with the above sentence.
> Maybe it's just me.
>
>> +
>> +# IOMMU operations
>> +
>> +## Initialize PV-IOMMU
>> +
>> +Initialize PV-IOMMU for the domain.
>> +It can only be called once.
>> +
>
> Could this operation be done automatically on first context allocation ?
>

For initializing PV-IOMMU, you need to pass some additional parameters
(memory/context limits). To avoid a guest from initializing with
arbitrary limits, it can also be done by the toolstack (e.g domain
builder) to enforce some specific limitations as this initialization can
only be done once.

>> +## Alloc context
>> +
>> +Create a new IOMMU context for the guest and return the context
>> number to the
>> +guest.
>> +Fail if the IOMMU context limit of the guest is reached.
>> +
>> +A flag can be specified to create a identity mapping.
>> +
>> +## Free context
>> +
>> +Destroy a IOMMU context created previously.
>> +It is not possible to free the default context.
>> +
>> +Reattach context devices to default context if specified by the guest.
>> +
>> +Fail if there is a device in the context and reattach-to-default flag
>> is not
>> +specified.
>> +
>> +## Reattach device
>> +
>> +Reattach a device to another IOMMU context (including the default one).
>> +The target IOMMU context number must be valid and the context allocated.
>> +
>> +The guest needs to specify a PCI SBDF of a device he has access to.
>> +
>> +## Map/unmap page
>> +
>> +Map/unmap a page on a context.
>> +The guest needs to specify a gfn and target dfn to map.
>> +
>> +Refuse to create the mapping if one already exist for the same dfn.
>> +
>> +## Lookup page
>> +
>> +Get the gfn mapped by a specific dfn.
>> +
>> +## Remote command
>> +
>> +Make a PV-IOMMU operation on behalf of another domain.
>> +Especially useful for implementing IOMMU emulation (e.g using QEMU)
>> +or initializing PV-IOMMU with enforced limits.
>> +
>> +# Implementation considerations
>> +
>> +## Hypercall batching
>> +
>> +In order to prevent unneeded hypercalls and IOMMU flushing, it is
>> advisable to
>> +be able to batch some critical IOMMU operations (e.g map/unmap
>> multiple pages).
>> +
>
> I suppose that batching also implies preemption.
>

Yes, the current implementation does it, but I haven't updated to doc on
that aspect.

>> +## Hardware without IOMMU support
>> +
>> +Operating system needs to be aware on PV-IOMMU capability, and
>> whether it is
>> +able to make contexts. However, some operating system may critically
>> fail in
>> +case they are able to make a new IOMMU context. Which is supposed to
>> happen
>> +if no IOMMU hardware is available.
>> +
>> +The hypercall interface needs a interface to advertise the ability to
>> create
>> +and manage IOMMU contexts including the amount of context the guest
>> is able
>> +to use. Using these informations, the Dom0 may decide whether to use
>> or not
>> +the PV-IOMMU interface.
>> +
>> +## Page pool for contexts
>> +
>> +In order to prevent unexpected starving on the hypervisor memory with a
>> +buggy Dom0. We can preallocate the pages the contexts will use and make
>> +map/unmap use these pages instead of allocating them dynamically.
>> +
>
> Regards,
>    Frediano
>

Thanks
Teddy



Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Jan Beulich
On 19.02.2025 14:52, Oleksandr Andrushchenko wrote:
> On 19.02.25 15:18, Jan Beulich wrote:
>> On 19.02.2025 13:39, Oleksandr Andrushchenko wrote:
>>> On 17.02.25 12:20, Jan Beulich wrote:
 On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:
> @@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct 
> serial_port *port)
>if ( ns16550_ioport_invalid(uart) )
>return -EIO;
>
> -return ( (ns_read_reg(uart, UART_LSR) &
> -  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 0;
> +return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
> uart->lsr_mask)
> +   ? uart->fifo_size
> +   : 0;
 Indentation of the ? and : lines is clearly wrong here? What is the tool
 doing?
>>> There are number of options that have influence on this formatting:
>>> AllowShortBlocksOnASingleLine [4]
>>> BreakBeforeTernaryOperators [5]
>>> AlignOperands [6]
>>>
>>> I was not able to tweak these options to have the previous form.
>> Right, sticking to the original form (with just the stray blanks zapped)
>> would of course be best. Yet again - the tool is doing more transformations
>> despite there not being any need. If, however, it does so, then one of my
>> expectations would be that the ? and : are properly indented:
>>
>>  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
>> uart->lsr_mask)
>> ? uart->fifo_size
>> : 0;
> This only differs from what the tool is doing by the fact it applies
> the following rule: *IndentWidth: 4*, e.g. it has indented your construct
> by 4 spaces, see [1]. Which, IMO, is acceptable change.

I don't view this as acceptable. It falls in the same class then as

ns_write_reg(uart,
 UART_FCR,
 UART_FCR_ENABLE | UART_FCR_CLRX | UART_FCR_CLTX |
 UART_FCR_TRG14);

that I also commented on in my initial reply.

>> or
>>
>>  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
>> uart->lsr_mask)
>> ? uart->fifo_size : 0;
>>
>> or
>>
>>  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask
>>  ? uart->fifo_size
>>  : 0);
>>
>> (not going to list more variants which are all okay).
>>
>> In any event, a fundamental requirement of mine is that such a tool would
>> only apply adjustments when and where style is actively violated. I.e. in
>> the case here:
>>
>> return ((ns_read_reg(uart, UART_LSR) &
>>  uart->lsr_mask) == uart->lsr_mask) ? uart->fifo_size : 0;
>>
>> That's not overly neat wrapping, but in line with our style. If the other
>> form was demanded going forward, I'd be curious how you'd verbally
>> describe the requirement in ./CODING_STYLE.
> I believe this can be stated around the fact that we need to indent,
> e.g. apply the same rule as for other constructs already in use

Except here the tool didn't merely adjust indentation, but moved tokens
between lines.

> @@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 
> *uart)
>#ifdef NS16550_PCI
>if ( uart->bar && uart->io_base >= 0x1 )
>{
> -pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
> -  uart->ps_bdf[2]),
> - PCI_COMMAND, PCI_COMMAND_MEMORY);
> +pci_conf_write16(
> +PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], 
> uart->ps_bdf[2]),
> +PCI_COMMAND,
> +PCI_COMMAND_MEMORY);
>return;
>}
 Hmm, transforming a well-formed block into another well-formed one. No
 gain? (Same again further down.)
>>> No, gain from human point of view
>>> But there is a gain that it is now formatted automatically.
>> See above: I'd first like to see a written, textual description for all these
>> requirements. After all it needs to be possible for a human to write code
>> that the tool then wouldn't try to re-arrange. Which in turn requires that
>> the restrictions / constraints on the layout are spelled out.
> Agree, the existing coding style document will require some extension:
> at least clarifications and addition of the rules not described yet.
>>   I'm not looking
>> forward to pass all my patches through such a tool. I can write style-
>> conforming code pretty well, with - of course - occasional oversights,
> Which the tool will allow not to have for less accurate developers

I fear I don't understand this reply of yours.

>>   right
>> now. And that in multiple projects all with different styles. I expect to be
>> in the position to do so also going forward. This, imo, requires that there
>> be left some room for variations. Which in turn requires that the tool would
>> leave alone anything that is not in conflict with the written down or defacto
>> style.
> Not sure it is possible with any tool at all: 

Re: [PATCH v2] x86/svm: Separate STI and VMRUN instructions in svm_asm_do_resume()

2025-02-19 Thread Oleksii Kurochko


On 2/18/25 3:45 PM, Andrew Cooper wrote:

On 18/02/2025 2:42 pm, Jan Beulich wrote:

On 18.02.2025 15:37, Andrew Cooper wrote:

There is a corner case in the VMRUN instruction where its INTR_SHADOW state
leaks into guest state if a VMExit occurs before the VMRUN is complete.  An
example of this could be taking #NPF due to event injection.

Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
external interrupts too.  However, an exception (while fatal) will appear to
be in an irqs-on region (as GIF isn't considered), so position the STI after
the speculation actions but prior to the GPR pops.

Link:https://lore.kernel.org/all/cadh9ctbs1ypme4acfgpnbwa10ca8ruak2go7542djmzgs4u...@mail.gmail.com/
Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
Signed-off-by: Andrew Cooper
Reviewed-by: Jan Beulich
---
v2:
  * Move after the speculation actions.

Emailed out just for completeness.  I've queued it in my for-4.21 branch.

It'll want backporting, so I wonder if we should persuade Oleksii into
taking it for 4.20.


Based on that ...


If Oleksii is happy, I can put it into 4.20.


... Release-Acked-by: Oleksii Kurochko

Thanks.

~ Oleksii


Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Oleksandr Andrushchenko

Hello, Jan!

On 17.02.25 12:20, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -14,7 +14,7 @@
   * abstracted.
   */
  #if defined(CONFIG_HAS_PCI) && defined(CONFIG_X86)
-# define NS16550_PCI
+#define NS16550_PCI
  #endif

Hmm. Either form ought to be okay, so the line would want leaving untouched.

This is controlled by PPIndentWidth option [1]:
we have it set to "-1" which means "When set to -1 (default) *IndentWidth*
is used also for preprocessor statements."

Unfortunately, I was not able to see a change with PPIndentWidth + IndentWidth

PPIndentWidth: -1
IndentWidth: 4

--- test_define.c    2025-02-19 11:20:30.708096428 +0200
+++ test_define_minus1.c    2025-02-19 11:21:42.313013373 +0200
@@ -1,5 +1,5 @@
 #ifdef __linux__
-# define FOO
+#define FOO
 #else
-# define BAR
+#define BAR
 #endif

PPIndentWidth: 1
IndentWidth: 4

--- test_define.c    2025-02-19 11:20:30.708096428 +0200
+++ test_define_1.c    2025-02-19 11:23:46.986618706 +0200
@@ -1,5 +1,5 @@
 #ifdef __linux__
-# define FOO
+#define FOO
 #else
-# define BAR
+#define BAR
 #endif

I thought it is a bug in the latest clang-format (21), but I was not able to
get the expected result with clang-format 18 as well. I am not sure if
I'm doing anything wrong with .clang-format, but this works one way
for me as ow now.
Takeaway: I was not able to control this, but the result seems to be acceptable
It should be noted in the coding style though

@@ -43,12 +43,12 @@
  
  static struct ns16550 {

  int baud, clock_hz, data_bits, parity, stop_bits, fifo_size, irq;
-u64 io_base;   /* I/O port or memory-mapped I/O address. */
+u64 io_base; /* I/O port or memory-mapped I/O address. */
  u64 io_size;
  int reg_shift; /* Bits to shift register offset by */

Breaking alignment between comments (also in the immediately following hunk),
while at the same time ...

This one...

  int reg_width; /* Size of access to use, the registers
  * themselves are still bytes */

... not taking care of the comment style violation here?

This is controlled by ReflowComments option [3]:

From the tool point of view the comment is formatted correctly
I didn't find a way to convert such comments into
/*
 * Size of access to use, the registers * themselves are still bytes */ if this 
is what you mean.

@@ -63,8 +63,8 @@ static struct ns16550 {
  bool dw_usr_bsy;
  #ifdef NS16550_PCI
  /* PCI card parameters. */
-bool pb_bdf_enable; /* if =1, pb-bdf effective, port behind bridge */
-bool ps_bdf_enable; /* if =1, ps_bdf effective, port on pci card */
+bool pb_bdf_enable; /* if =1, pb-bdf effective, port behind bridge */
+bool ps_bdf_enable; /* if =1, ps_bdf effective, port on pci card */

(Just leaving this here for context of the earlier comment.)

... and this one. It seems that the tool tries to always have a single space 
before
the comment. There is an option SpacesBeforeTrailingComments [2] which
unfortunately only controls C++ style comments and "...does not affect trailing
block comments (|/*| - comments)..."
So, it seems that this is the rule to consider

@@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct serial_port 
*port)
  if ( ns16550_ioport_invalid(uart) )
  return -EIO;
  
-return ( (ns_read_reg(uart, UART_LSR) &

-  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 0;
+return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
+   ? uart->fifo_size
+   : 0;

Indentation of the ? and : lines is clearly wrong here? What is the tool
doing?

There are number of options that have influence on this formatting:
AllowShortBlocksOnASingleLine [4]
BreakBeforeTernaryOperators [5]
AlignOperands [6]

I was not able to tweak these options to have the previous form.

@@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 *uart)
  #ifdef NS16550_PCI
  if ( uart->bar && uart->io_base >= 0x1 )
  {
-pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
-  uart->ps_bdf[2]),
- PCI_COMMAND, PCI_COMMAND_MEMORY);
+pci_conf_write16(
+PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], uart->ps_bdf[2]),
+PCI_COMMAND,
+PCI_COMMAND_MEMORY);
  return;
  }

Hmm, transforming a well-formed block into another well-formed one. No
gain? (Same again further down.)

No, gain from human point of view
But there is a gain that it is now formatted automatically.

@@ -335,14 +338,14 @@ static void ns16550_setup_preirq(struct ns16550 *uart)
  else
  {
  /* Baud rate already set: read it out from the divisor latch. */
-divisor  = ns_read_reg(uart, UART_DLL);
+divisor = ns_read_reg(uart, UART_DLL);
  divisor |= ns_read_reg(uart, UART_DLM) << 8;

An example where tabu

Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Jan Beulich
On 19.02.2025 13:39, Oleksandr Andrushchenko wrote:
> On 17.02.25 12:20, Jan Beulich wrote:
>> On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:
>>> @@ -43,12 +43,12 @@
>>>   
>>>   static struct ns16550 {
>>>   int baud, clock_hz, data_bits, parity, stop_bits, fifo_size, irq;
>>> -u64 io_base;   /* I/O port or memory-mapped I/O address. */
>>> +u64 io_base; /* I/O port or memory-mapped I/O address. */
>>>   u64 io_size;
>>>   int reg_shift; /* Bits to shift register offset by */
>> Breaking alignment between comments (also in the immediately following hunk),
>> while at the same time ...
> This one...
>>>   int reg_width; /* Size of access to use, the registers
>>>   * themselves are still bytes */
>> ... not taking care of the comment style violation here?
> This is controlled by ReflowComments option [3]:
> 
>  From the tool point of view the comment is formatted correctly
> I didn't find a way to convert such comments into
> /*
>   * Size of access to use, the registers * themselves are still bytes */ if 
> this is what you mean.

Above you see what I received. I can't really deduce from this what
formatting you suggested. In the case at hand, though, I think it's not
the best idea anyway to put a multi-line comment past a declaration (or
statement).

   /*
* Size of access to use, the registers
* themselves are still bytes
*/
   int reg_width;

is what I think would be better in such a case (artificially keeping
this to be multi-line, even if it looks as if it might fit on just one
line then).

>>> @@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct serial_port 
>>> *port)
>>>   if ( ns16550_ioport_invalid(uart) )
>>>   return -EIO;
>>>   
>>> -return ( (ns_read_reg(uart, UART_LSR) &
>>> -  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 0;
>>> +return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
>>> uart->lsr_mask)
>>> +   ? uart->fifo_size
>>> +   : 0;
>> Indentation of the ? and : lines is clearly wrong here? What is the tool
>> doing?
> There are number of options that have influence on this formatting:
> AllowShortBlocksOnASingleLine [4]
> BreakBeforeTernaryOperators [5]
> AlignOperands [6]
> 
> I was not able to tweak these options to have the previous form.

Right, sticking to the original form (with just the stray blanks zapped)
would of course be best. Yet again - the tool is doing more transformations
despite there not being any need. If, however, it does so, then one of my
expectations would be that the ? and : are properly indented:

return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
   ? uart->fifo_size
   : 0;

or

return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
   ? uart->fifo_size : 0;

or

return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask
? uart->fifo_size
: 0);

(not going to list more variants which are all okay).

In any event, a fundamental requirement of mine is that such a tool would
only apply adjustments when and where style is actively violated. I.e. in
the case here:

   return ((ns_read_reg(uart, UART_LSR) &
uart->lsr_mask) == uart->lsr_mask) ? uart->fifo_size : 0;

That's not overly neat wrapping, but in line with our style. If the other
form was demanded going forward, I'd be curious how you'd verbally
describe the requirement in ./CODING_STYLE.

>>> @@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 *uart)
>>>   #ifdef NS16550_PCI
>>>   if ( uart->bar && uart->io_base >= 0x1 )
>>>   {
>>> -pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
>>> -  uart->ps_bdf[2]),
>>> - PCI_COMMAND, PCI_COMMAND_MEMORY);
>>> +pci_conf_write16(
>>> +PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], uart->ps_bdf[2]),
>>> +PCI_COMMAND,
>>> +PCI_COMMAND_MEMORY);
>>>   return;
>>>   }
>> Hmm, transforming a well-formed block into another well-formed one. No
>> gain? (Same again further down.)
> No, gain from human point of view
> But there is a gain that it is now formatted automatically.

See above: I'd first like to see a written, textual description for all these
requirements. After all it needs to be possible for a human to write code
that the tool then wouldn't try to re-arrange. Which in turn requires that
the restrictions / constraints on the layout are spelled out. I'm not looking
forward to pass all my patches through such a tool. I can write style-
conforming code pretty well, with - of course - occasional oversights, right
now. And that in multiple projects all with different styles. I expect to be
in the position to do so also going forward. This, imo, requires that there
be left some room for variations. Which in turn requir

Re: [PATCH 0/2] code style exercise: Drivers folder samples

2025-02-19 Thread Oleksandr Andrushchenko




On 19.02.25 14:51, Oleksandr Andrushchenko wrote:

Hello, Andrew!

On 19.02.25 14:49, Andrew Cooper wrote:

On 16/02/2025 5:11 pm, Oleksandr Andrushchenko wrote:

Hello, Roger!

Please find the branch with all the conversions [1].
Unfortunately I cannot provide a branch as seen with
diff --ignore-all-space as such a patch will not simply apply.

Stay safe,
Oleksandr Andrushchenko

On 16.02.25 13:58, Andrew Cooper wrote:

On 16/02/2025 10:21 am, Oleksandr Andrushchenko wrote:

There are two diff files which show what happens in case the same is
applied to the whole xen/drivers directory:
- first one is the result of the "git diff" command, 1.2M [3]
- the second one is for "git diff --ignire-all-space", 600K [4]

Please can you format everything, and put it on a branch somewhere, so
people can browse.

~Andrew

[1] https://github.com/andr2000/xen/tree/clang_ml_drivers_v002_diff

force pushed to the same branch for all Xen

That appears to only be drivers/

I thought that was the agreement, so we start from the drivers first


Please do *everything*.  I want to see what this does to files I
consider to be pretty clean Xen-style.

Sure


~Andrew


Thank you,
Oleksandr





Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Joel Fernandes
On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote:
> On 17/01/25 16:52, Jann Horn wrote:
> > On Fri, Jan 17, 2025 at 4:25 PM Valentin Schneider  
> > wrote:
> >> On 14/01/25 19:16, Jann Horn wrote:
> >> > On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider  
> >> > wrote:
> >> >> vunmap()'s issued from housekeeping CPUs are a relatively common source 
> >> >> of
> >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the
> >> >> flush_tlb_kernel_range() IPIs.
> >> >>
> >> >> Given that CPUs executing in userspace do not access data in the vmalloc
> >> >> range, these IPIs could be deferred until their next kernel entry.
> >> >>
> >> >> Deferral vs early entry danger zone
> >> >> ===
> >> >>
> >> >> This requires a guarantee that nothing in the vmalloc range can be 
> >> >> vunmap'd
> >> >> and then accessed in early entry code.
> >> >
> >> > In other words, it needs a guarantee that no vmalloc allocations that
> >> > have been created in the vmalloc region while the CPU was idle can
> >> > then be accessed during early entry, right?
> >>
> >> I'm not sure if that would be a problem (not an mm expert, please do
> >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't
> >> deferred anyway.
> >
> > flush_cache_vmap() is about stuff like flushing data caches on
> > architectures with virtually indexed caches; that doesn't do TLB
> > maintenance. When you look for its definition on x86 or arm64, you'll
> > see that they use the generic implementation which is simply an empty
> > inline function.
> >
> >> So after vmapping something, I wouldn't expect isolated CPUs to have
> >> invalid TLB entries for the newly vmapped page.
> >>
> >> However, upon vunmap'ing something, the TLB flush is deferred, and thus
> >> stale TLB entries can and will remain on isolated CPUs, up until they
> >> execute the deferred flush themselves (IOW for the entire duration of the
> >> "danger zone").
> >>
> >> Does that make sense?
> >
> > The design idea wrt TLB flushes in the vmap code is that you don't do
> > TLB flushes when you unmap stuff or when you map stuff, because doing
> > TLB flushes across the entire system on every vmap/vunmap would be a
> > bit costly; instead you just do batched TLB flushes in between, in
> > __purge_vmap_area_lazy().
> >
> > In other words, the basic idea is that you can keep calling vmap() and
> > vunmap() a bunch of times without ever doing TLB flushes until you run
> > out of virtual memory in the vmap region; then you do one big TLB
> > flush, and afterwards you can reuse the free virtual address space for
> > new allocations again.
> >
> > So if you "defer" that batched TLB flush for CPUs that are not
> > currently running in the kernel, I think the consequence is that those
> > CPUs may end up with incoherent TLB state after a reallocation of the
> > virtual address space.
> >
> 
> Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc
> that occurred while an isolated CPU was NOHZ-FULL can be an issue if said
> CPU accesses it during early entry;

So the issue is:

CPU1: unmappes vmalloc page X which was previously mapped to physical page
P1.

CPU2: does a whole bunch of vmalloc and vfree eventually crossing some lazy
threshold and sending out IPIs. It then goes ahead and does an allocation
that maps the same virtual page X to physical page P2.

CPU3 is isolated and executes some early entry code before receving said IPIs
which are supposedly deferred by Valentin's patches.

It does not receive the IPI becuase it is deferred, thus access by early
entry code to page X on this CPU results in a UAF access to P1.

Is that the issue?

thanks,

 - Joel




Re: [PATCH for 4.20? v4 3/3] xen/riscv: update mfn calculation in pt_mapping_level()

2025-02-19 Thread Jan Beulich
On 19.02.2025 15:46, Oleksii Kurochko wrote:
> On 2/19/25 12:28 PM, Jan Beulich wrote:
>> On 12.02.2025 17:50, Oleksii Kurochko wrote:
>>> +else
>>>   {
>>> -rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>>> -if ( rc == XEN_TABLE_MAP_NOMEM )
>>> +pte_t *table;
>>> +unsigned int level = HYP_PT_ROOT_LEVEL;
>>> +/* convenience aliases */
>> Nit: Style.
> 
>  From the 'Comments' section of CODING_STYLE, I see that the comment should 
> start
> with capital letter. Do you mean that?

In the (earlier) reply to "xen/riscv: identify specific ISA supported by cpu"
I said precisely that. I didn't think I'd need to repeat it here. So yes.

Jan



Re: [PATCH v3 1/2] xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

2025-02-19 Thread Jan Beulich
On 19.02.2025 16:25, Luca Fancellu wrote:
>> On 19 Feb 2025, at 13:30, Jan Beulich  wrote:
>> On 19.02.2025 14:06, Luca Fancellu wrote:
 On 19 Feb 2025, at 12:45, Jan Beulich  wrote:
 As per the
 respective revlog entry, this change looks to belong into whatever is
 going to be done to deal with the one Arm use of the macro. Or maybe
 it's unneeded altogether.
>>>
>>> I didn’t understand that you were opposing to protecting iommu_use_hap_pt() 
>>> when
>>> !HAS_PASSTHROUGH, I thought you were referring only to the stub in the #else
>>> branch.
>>> Can I ask why?
>>
>> Sure. And no, I'm not against the extra protection. I'm against unnecessary
>> code churn. That is, any such a re-arrangement wants to have some kind of
>> justification.
> 
> ok, yes the justification is that MPU system will be built with 
> !HAS_PASSTHROUGH,
> but there is a common function (p2m_set_way_flush) to MMU/MPU subsystem that
> I would like to keep common, to do so I have to hide the macro in this 
> particular
> configuration and afterwards I have two choices:
> 
> 1) provide a stub implementation on the Arm side
> 2) provide a stub implementation in iommu.h
> 
> number 2 felt better because it could be applied on any Xen configuration 
> without
> HAS_PASSTHROUGH, even if at the moment there is only MPU.
> 
> Number 1 let the possibility for the specific configuration to choose what to 
> do in absence
> of HAS_PASSTHROUGH.
> 
> Now I would like your view on what would be acceptable here.

I think I indicated earlier that I'd like the Arm maintainers to voice
their preference. Doing it in iommu.h may be okay, but also may not be.
Yet to decide that very Arm use of the macro needs taking into account,
and I lack context there.

>>> in any case when !HAS_PASSTHROUGH, this macro is not usable
>>> since dom_iommu() is resolved to a membed that doesn’t exist in the 
>>> configuration,
>>> am I missing something?
>>
>> You very likely aren't, yet the macro's presence also does no harm. We
>> have lots of macros and declarations which are usable only in certain
>> configurations. Sometimes this just happens to be that way, sometimes it's
>> actually deliberate (e.g. to facilitate DCE).
> 
> Ok, in this particular case, as I explained above, this macro is one of the 
> thing preventing
> Arm MPU side to build, otherwise I wouldn’t have touched it.

Yes, except that this wasn't said anywhere. Also if you mean to take
care of this macro here, then in full please. I.e. either don't touch
that area of the header at all, or provide (wherever suitable) a
stub alongside moving the #ifdef.

Jan



Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Valentin Schneider
On 19/02/25 10:05, Joel Fernandes wrote:
> On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote:
>> On 17/01/25 16:52, Jann Horn wrote:
>> > On Fri, Jan 17, 2025 at 4:25 PM Valentin Schneider  
>> > wrote:
>> >> On 14/01/25 19:16, Jann Horn wrote:
>> >> > On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider 
>> >> >  wrote:
>> >> >> vunmap()'s issued from housekeeping CPUs are a relatively common 
>> >> >> source of
>> >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the
>> >> >> flush_tlb_kernel_range() IPIs.
>> >> >>
>> >> >> Given that CPUs executing in userspace do not access data in the 
>> >> >> vmalloc
>> >> >> range, these IPIs could be deferred until their next kernel entry.
>> >> >>
>> >> >> Deferral vs early entry danger zone
>> >> >> ===
>> >> >>
>> >> >> This requires a guarantee that nothing in the vmalloc range can be 
>> >> >> vunmap'd
>> >> >> and then accessed in early entry code.
>> >> >
>> >> > In other words, it needs a guarantee that no vmalloc allocations that
>> >> > have been created in the vmalloc region while the CPU was idle can
>> >> > then be accessed during early entry, right?
>> >>
>> >> I'm not sure if that would be a problem (not an mm expert, please do
>> >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't
>> >> deferred anyway.
>> >
>> > flush_cache_vmap() is about stuff like flushing data caches on
>> > architectures with virtually indexed caches; that doesn't do TLB
>> > maintenance. When you look for its definition on x86 or arm64, you'll
>> > see that they use the generic implementation which is simply an empty
>> > inline function.
>> >
>> >> So after vmapping something, I wouldn't expect isolated CPUs to have
>> >> invalid TLB entries for the newly vmapped page.
>> >>
>> >> However, upon vunmap'ing something, the TLB flush is deferred, and thus
>> >> stale TLB entries can and will remain on isolated CPUs, up until they
>> >> execute the deferred flush themselves (IOW for the entire duration of the
>> >> "danger zone").
>> >>
>> >> Does that make sense?
>> >
>> > The design idea wrt TLB flushes in the vmap code is that you don't do
>> > TLB flushes when you unmap stuff or when you map stuff, because doing
>> > TLB flushes across the entire system on every vmap/vunmap would be a
>> > bit costly; instead you just do batched TLB flushes in between, in
>> > __purge_vmap_area_lazy().
>> >
>> > In other words, the basic idea is that you can keep calling vmap() and
>> > vunmap() a bunch of times without ever doing TLB flushes until you run
>> > out of virtual memory in the vmap region; then you do one big TLB
>> > flush, and afterwards you can reuse the free virtual address space for
>> > new allocations again.
>> >
>> > So if you "defer" that batched TLB flush for CPUs that are not
>> > currently running in the kernel, I think the consequence is that those
>> > CPUs may end up with incoherent TLB state after a reallocation of the
>> > virtual address space.
>> >
>>
>> Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc
>> that occurred while an isolated CPU was NOHZ-FULL can be an issue if said
>> CPU accesses it during early entry;
>
> So the issue is:
>
> CPU1: unmappes vmalloc page X which was previously mapped to physical page
> P1.
>
> CPU2: does a whole bunch of vmalloc and vfree eventually crossing some lazy
> threshold and sending out IPIs. It then goes ahead and does an allocation
> that maps the same virtual page X to physical page P2.
>
> CPU3 is isolated and executes some early entry code before receving said IPIs
> which are supposedly deferred by Valentin's patches.
>
> It does not receive the IPI becuase it is deferred, thus access by early
> entry code to page X on this CPU results in a UAF access to P1.
>
> Is that the issue?
>

Pretty much so yeah. That is, *if* there such a vmalloc'd address access in
early entry code - testing says it's not the case, but I haven't found a
way to instrumentally verify this.




Re: [PATCH v6] Avoid crash calling PrintErrMesg from efi_multiboot2

2025-02-19 Thread Frediano Ziglio
On Mon, Feb 17, 2025 at 4:56 PM Jan Beulich  wrote:
>
> On 17.02.2025 17:52, Frediano Ziglio wrote:
> > On Mon, Feb 17, 2025 at 4:41 PM Andrew Cooper  
> > wrote:
> >>
> >> On 17/02/2025 4:31 pm, Jan Beulich wrote:
> >>> On 17.02.2025 17:26, Frediano Ziglio wrote:
>  --- a/xen/common/efi/efi-common.mk
>  +++ b/xen/common/efi/efi-common.mk
>  @@ -2,6 +2,7 @@ EFIOBJ-y := boot.init.o pe.init.o ebmalloc.o runtime.o
>   EFIOBJ-$(CONFIG_COMPAT) += compat.o
> 
>   CFLAGS-y += -fshort-wchar
>  +CFLAGS-y += -fno-jump-tables
>   CFLAGS-y += -iquote $(srctree)/common/efi
>   CFLAGS-y += -iquote $(srcdir)
> >>> Do source files other than boot.c really need this? Do any other files 
> >>> outside
> >>> of efi/ maybe also need this (iirc this point was made along with the v5 
> >>> comment
> >>> you got)?
> >>
> >> Every TU reachable from efi_multiboot2() needs this, because all of
> >> those have logic executed at an incorrect virtual address.
> >>
> >> ~Andrew
> >
> > I assume all the files in efi directory will deal with EFI boot so
> > it's good to have the option enabled for the files inside the
> > directory. The only exception seems runtime.c file.
>
> And compat.c, being a wrapper around runtime.c.
>
> > About external dependencies not sure how to detect them (beside
> > looking at all symbols).
>
> Which raises the question whether we don't need to turn on that option
> globally, without (as we have it now) any condition.
>
> Jan

I would avoid adding a potential option that could affect performances
so badly at the moment.
These changes are pretty contained.
I would merge this patch and check any external dependencies as a follow up.

Frediano



[PATCH v3 3/3] x86/dom0: be less restrictive with the Interrupt Address Range

2025-02-19 Thread Roger Pau Monne
Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
into the interrupt address range [0xfee0, 0xfeef].  This range has
two different purposes.  For accesses from the CPU is contains the default
position of local APIC page at 0xfee0.  For accesses from devices
it's the MSI address range, so the address field in the MSI entries
(usually) point to an address on that range to trigger an interrupt.

There are reports of Lenovo Thinkpad devices placing what seems to be the
UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
Attempting to use that device with a Linux PV dom0 leads to an error when
Linux kernel maps 0xfeec2000:

RIP: e030:xen_mc_flush+0x1e8/0x2b0
 xen_leave_lazy_mmu+0x15/0x60
 vmap_range_noflush+0x408/0x6f0
 __ioremap_caller+0x20d/0x350
 acpi_os_map_iomem+0x1a3/0x1c0
 acpi_ex_system_memory_space_handler+0x229/0x3f0
 acpi_ev_address_space_dispatch+0x17e/0x4c0
 acpi_ex_access_region+0x28a/0x510
 acpi_ex_field_datum_io+0x95/0x5c0
 acpi_ex_extract_from_field+0x36b/0x4e0
 acpi_ex_read_data_from_field+0xcb/0x430
 acpi_ex_resolve_node_to_value+0x2e0/0x530
 acpi_ex_resolve_to_value+0x1e7/0x550
 acpi_ds_evaluate_name_path+0x107/0x170
 acpi_ds_exec_end_op+0x392/0x860
 acpi_ps_parse_loop+0x268/0xa30
 acpi_ps_parse_aml+0x221/0x5e0
 acpi_ps_execute_method+0x171/0x3e0
 acpi_ns_evaluate+0x174/0x5d0
 acpi_evaluate_object+0x167/0x440
 acpi_evaluate_dsm+0xb6/0x130
 ucsi_acpi_dsm+0x53/0x80
 ucsi_acpi_read+0x2e/0x60
 ucsi_register+0x24/0xa0
 ucsi_acpi_probe+0x162/0x1e3
 platform_probe+0x48/0x90
 really_probe+0xde/0x340
 __driver_probe_device+0x78/0x110
 driver_probe_device+0x1f/0x90
 __driver_attach+0xd2/0x1c0
 bus_for_each_dev+0x77/0xc0
 bus_add_driver+0x112/0x1f0
 driver_register+0x72/0xd0
 do_one_initcall+0x48/0x300
 do_init_module+0x60/0x220
 __do_sys_init_module+0x17f/0x1b0
 do_syscall_64+0x82/0x170

Remove the restrictions to create mappings the interrupt address range for
dom0.  Note that the restriction to map the local APIC page is enforced
separately, and that continues to be present.  Additionally make sure the
emulated local APIC page is also not mapped, in case dom0 is using it.

Note that even if the interrupt address range entries are populated in the
IOMMU page-tables no device access will reach those pages.  Device accesses
to the Interrupt Address Range will always be converted into Interrupt
Messages and are not subject to DMA remapping.

There's also the following restriction noted in Intel VT-d:

> Software must not program paging-structure entries to remap any address to
> the interrupt address range. Untranslated requests and translation requests
> that result in an address in the interrupt range will be blocked with
> condition code LGN.4 or SGN.8. Translated requests with an address in the
> interrupt address range are treated as Unsupported Request (UR).

Similarly for AMD-Vi:

> Accesses to the interrupt address range (Table 3) are defined to go through
> the interrupt remapping portion of the IOMMU and not through address
> translation processing. Therefore, when a transaction is being processed as
> an interrupt remapping operation, the transaction attribute of
> pretranslated or untranslated is ignored.
>
> Software Note: The IOMMU should
> not be configured such that an address translation results in a special
> address such as the interrupt address range.

However those restrictions don't apply to the identity mappings possibly
created for dom0, since the interrupt address range is never subject to DMA
remapping, and hence there's no output address after translation that
belongs to the interrupt address range.

Reported-by: Jürgen Groß 
Link: 
https://lore.kernel.org/xen-devel/baade0a7-e204-4743-bda1-282df74e5...@suse.com/
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Make sure vlapic page is not mapped.

Changes since v1:
 - No longer needs to modify arch_iommu_hwdom_init().
---
 xen/arch/x86/dom0_build.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index a735e3650c28..3cda0c2c8765 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -554,6 +554,12 @@ int __init dom0_setup_permissions(struct domain *d)
 mfn = paddr_to_pfn(mp_lapic_addr);
 rc |= iomem_deny_access(d, mfn, mfn);
 }
+/* If using an emulated local APIC make sure its MMIO is unpopulated. */
+if ( has_vlapic(d) )
+{
+mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
+rc |= iomem_deny_access(d, mfn, mfn);
+}
 /* I/O APICs. */
 for ( i = 0; i < nr_ioapics; i++ )
 {
@@ -563,10 +569,6 @@ int __init dom0_setup_permissions(struct domain *d)
  !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
 rc |= iomem_deny_access(d, mfn, mfn);
 }
-/* MSI range. */
-rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
-paddr_to_pfn(MSI_ADDR_BAS

[PATCH v3 1/3] x86/dom0: correctly set the maximum ->iomem_caps bound for PVH

2025-02-19 Thread Roger Pau Monne
The logic in dom0_setup_permissions() sets the maximum bound in
->iomem_caps unconditionally using paddr_bits, which is not correct for HVM
based domains.  Instead use domain_max_paddr_bits() to get the correct
maximum paddr bits for each possible domain type.

Switch to using PFN_DOWN() instead of PAGE_SHIFT, as that's shorter.

Fixes: 53de839fb409 ('x86: constrain MFN range Dom0 may access')
Signed-off-by: Roger Pau Monné 
---
The fixes tag might be dubious, IIRC at that time we had PVHv1 dom0, which
would likely also need such adjustment, but not the current PVHv2.
---
Changes since v2:
 - New in this version.
---
 xen/arch/x86/dom0_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 3b9681dc9134..aec596997d5d 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -481,7 +481,8 @@ int __init dom0_setup_permissions(struct domain *d)
 
 /* The hardware domain is initially permitted full I/O capabilities. */
 rc = ioports_permit_access(d, 0, 0x);
-rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1);
+rc |= iomem_permit_access(d, 0UL,
+  PFN_DOWN(1UL << domain_max_paddr_bits(d)) - 1);
 rc |= irqs_permit_access(d, 1, nr_irqs_gsi - 1);
 
 /* Modify I/O port access permissions. */
-- 
2.46.0




[PATCH v3 0/3] x86/dom0: be less restrictive with the Interrupt Address Range

2025-02-19 Thread Roger Pau Monne
Hello,

First two patches are preparatory changes to reduce the changes required
in patch 3.  I would have wanted those to go in 4.20 to fix the issues
on Lenovo Thinkpads, but it's too late now.

Thanks, Roger.

Roger Pau Monne (3):
  x86/dom0: correctly set the maximum ->iomem_caps bound for PVH
  x86/iommu: account for IOMEM caps when populating dom0 IOMMU
page-tables
  x86/dom0: be less restrictive with the Interrupt Address Range

 xen/arch/x86/dom0_build.c   | 24 ---
 xen/arch/x86/hvm/dom0_build.c   | 14 +++---
 xen/arch/x86/hvm/io.c   |  6 +--
 xen/arch/x86/include/asm/hvm/io.h   |  4 +-
 xen/drivers/passthrough/x86/iommu.c | 67 -
 5 files changed, 57 insertions(+), 58 deletions(-)

-- 
2.46.0




xl create/save throwing errors

2025-02-19 Thread Petr Beneš
Hello,

I have a script that's supposed to start a couple of (Windows 10) VMs
in parallel, wait until they boot and connect to the network, and then
create a live snapshot.

VMs are created by simple "xl create vm.cfg" and the live snapshot is
created by "xl save win10-18362-NNN path/to/state".

I have noticed, that "xl create" occasionally throws this line:
```
libxl: error: libxl_aoutils.c:646:libxl__kill_xs_path: qemu
command-line probe already exited
```

First I thought it's related to the fact that multiple "xl create"
commands are being run in parallel, but to my surprise, this line
sometimes occurs even for standalone "xl create" commands.

However, when "xl save" is being executed in parallel, I'm very often
met with output similar to this:
```
Saving to win10-18362-102/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 193, type x86 HVM
Saving to win10-18362-101/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 192, type x86 HVM
Saving to win10-18362-104/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 194, type x86 HVM
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 192:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
xc: error: Dom 192 not suspended: (shutdown 4, reason 3): Internal error
libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
192:xc_domain_resume failed: Invalid argument
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 194:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
xc: error: Dom 194 not suspended: (shutdown 4, reason 3): Internal error
libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
194:xc_domain_resume failed: Invalid argument
xc: Frames: 1044480/1044480  100%: Frames: 52224/10444805%
```

Here's an output of snapshotting 4 live VMs in parallel, where 3 of
the commands failed, and left the VMs in a running state.

Note that each "xl create"/"xl save" is executed for a separate VM.

For several months, I have executed standalone "xl save" commands with
VMs of the same settings without any problems.

Note that my VMs use qcow2 images as their disks - not ZFS or LVM:
```
disk = [ 'tap:qcow2:/win10-18362-101/clone/image.qcow2,xvda,w' ]
```

where win10-18362-101/clone/image.qcow2 is created as:
```
qemu-img create -f qcow2 -F qcow2 -b
"/win10-18362-101/base/image.qcow2"
"/win10-18362-101/clone/image.qcow2"
```

Is running "xl save" in parallel not supported? Or is it an issue with
qcow2 handling?

Best,
Petr



Re: xl create/save throwing errors

2025-02-19 Thread Petr Beneš
On Wed, Feb 19, 2025 at 5:04 PM Petr Beneš  wrote:
>
> Hello,
>

To add more information and observations:

I'm running Xen 4.20-rc on a MFF Dell Optiplex, CPU is i5-12500T (6
cores, 12 threads). I have allocated 8 cores for dom0. Now:

- xl saving 4 vms, each with 4 VCPUs tend to fail
- xl saving 4 vms, each with 2 VCPUs didn't fail so far
- xl saving 8 vms, each with 2 VCPUs didn't fail so far
- xl saving 12 vms, each with 2 VCPUs didn't fail either

Note that there's always enough memory for all the VMs + dom0.

Also, I have observed new error lines when xl create is being executed
in parallel:
```
libxl: error: libxl_qmp.c:1399:qmp_ev_fd_callback: Domain 89:error on
QMP socket: Connection reset by peer
libxl: error: libxl_qmp.c:1438:qmp_ev_fd_callback: Domain 89:Error
happened with the QMP connection to QEMU
```



Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Joel Fernandes



On 2/19/2025 11:18 AM, Valentin Schneider wrote:
> On 19/02/25 10:05, Joel Fernandes wrote:
>> On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote:
>>> On 17/01/25 16:52, Jann Horn wrote:
 On Fri, Jan 17, 2025 at 4:25 PM Valentin Schneider  
 wrote:
> On 14/01/25 19:16, Jann Horn wrote:
>> On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider  
>> wrote:
>>> vunmap()'s issued from housekeeping CPUs are a relatively common source 
>>> of
>>> interference for isolated NOHZ_FULL CPUs, as they are hit by the
>>> flush_tlb_kernel_range() IPIs.
>>>
>>> Given that CPUs executing in userspace do not access data in the vmalloc
>>> range, these IPIs could be deferred until their next kernel entry.
>>>
>>> Deferral vs early entry danger zone
>>> ===
>>>
>>> This requires a guarantee that nothing in the vmalloc range can be 
>>> vunmap'd
>>> and then accessed in early entry code.
>>
>> In other words, it needs a guarantee that no vmalloc allocations that
>> have been created in the vmalloc region while the CPU was idle can
>> then be accessed during early entry, right?
>
> I'm not sure if that would be a problem (not an mm expert, please do
> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't
> deferred anyway.

 flush_cache_vmap() is about stuff like flushing data caches on
 architectures with virtually indexed caches; that doesn't do TLB
 maintenance. When you look for its definition on x86 or arm64, you'll
 see that they use the generic implementation which is simply an empty
 inline function.

> So after vmapping something, I wouldn't expect isolated CPUs to have
> invalid TLB entries for the newly vmapped page.
>
> However, upon vunmap'ing something, the TLB flush is deferred, and thus
> stale TLB entries can and will remain on isolated CPUs, up until they
> execute the deferred flush themselves (IOW for the entire duration of the
> "danger zone").
>
> Does that make sense?

 The design idea wrt TLB flushes in the vmap code is that you don't do
 TLB flushes when you unmap stuff or when you map stuff, because doing
 TLB flushes across the entire system on every vmap/vunmap would be a
 bit costly; instead you just do batched TLB flushes in between, in
 __purge_vmap_area_lazy().

 In other words, the basic idea is that you can keep calling vmap() and
 vunmap() a bunch of times without ever doing TLB flushes until you run
 out of virtual memory in the vmap region; then you do one big TLB
 flush, and afterwards you can reuse the free virtual address space for
 new allocations again.

 So if you "defer" that batched TLB flush for CPUs that are not
 currently running in the kernel, I think the consequence is that those
 CPUs may end up with incoherent TLB state after a reallocation of the
 virtual address space.

>>>
>>> Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc
>>> that occurred while an isolated CPU was NOHZ-FULL can be an issue if said
>>> CPU accesses it during early entry;
>>
>> So the issue is:
>>
>> CPU1: unmappes vmalloc page X which was previously mapped to physical page
>> P1.
>>
>> CPU2: does a whole bunch of vmalloc and vfree eventually crossing some lazy
>> threshold and sending out IPIs. It then goes ahead and does an allocation
>> that maps the same virtual page X to physical page P2.
>>
>> CPU3 is isolated and executes some early entry code before receving said IPIs
>> which are supposedly deferred by Valentin's patches.
>>
>> It does not receive the IPI becuase it is deferred, thus access by early
>> entry code to page X on this CPU results in a UAF access to P1.
>>
>> Is that the issue?
>>
> 
> Pretty much so yeah. That is, *if* there such a vmalloc'd address access in
> early entry code - testing says it's not the case, but I haven't found a
> way to instrumentally verify this.
Ok, thanks for confirming. Maybe there is an address sanitizer way of verifying,
but yeah it is subtle and there could be more than one way of solving it. Too
much 'fun' ;)

 - Joel





Re: [PATCH for 4.20? v4 3/3] xen/riscv: update mfn calculation in pt_mapping_level()

2025-02-19 Thread Oleksii Kurochko


On 2/19/25 4:05 PM, Jan Beulich wrote:

On 19.02.2025 15:46, Oleksii Kurochko wrote:

On 2/19/25 12:28 PM, Jan Beulich wrote:

On 12.02.2025 17:50, Oleksii Kurochko wrote:

+else
   {
-rc = pt_next_level(alloc_tbl, &table, offsets[level]);
-if ( rc == XEN_TABLE_MAP_NOMEM )
+pte_t *table;
+unsigned int level = HYP_PT_ROOT_LEVEL;
+/* convenience aliases */

Nit: Style.

  From the 'Comments' section of CODING_STYLE, I see that the comment should 
start
with capital letter. Do you mean that?

In the (earlier) reply to "xen/riscv: identify specific ISA supported by cpu"
I said precisely that. I didn't think I'd need to repeat it here. So yes.


Of course, it was enough. The problem was that I started to read and answer to 
this patch
series first and went to another (where you wrote that) one after.

Anyway thank you for clarifying.

~ Oleksii


Re: [PATCH for 4.21 v6 2/2] xen/riscv: identify specific ISA supported by cpu

2025-02-19 Thread Oleksii Kurochko


On 2/19/25 12:05 PM, Jan Beulich wrote:

On 12.02.2025 17:50, Oleksii Kurochko wrote:

--- /dev/null
+++ b/xen/arch/riscv/cpufeature.c
@@ -0,0 +1,502 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Originally taken for Linux kernel v6.12-rc3.
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2024 Vates
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#ifdef CONFIG_ACPI
+# error "cpufeature.c functions should be updated to support ACPI"
+#endif
+
+struct riscv_isa_ext_data {
+unsigned int id;
+const char *name;
+};
+
+#define RISCV_ISA_EXT_DATA(ext_name)\
+{   \
+.id = RISCV_ISA_EXT_##ext_name, \

Nit: ## being a binary operator (just for the pre-processor) we prefer
it, too, to be framed by blanks.


+/*
+ * The canonical order of ISA extension names in the ISA string is defined in
+ * chapter 27 of the unprivileged specification.
+ *
+ * The specification uses vague wording, such as should, when it comes to
+ * ordering, so for our purposes the following rules apply:
+ *
+ * 1. All multi-letter extensions must be separated from other extensions by an
+ *underscore.
+ *
+ * 2. Additional standard extensions (starting with 'Z') must be sorted after
+ *single-letter extensions and before any higher-privileged extensions.
+ *
+ * 3. The first letter following the 'Z' conventionally indicates the most
+ *closely related alphabetical extension category, IMAFDQLCBKJTPVH.
+ *If multiple 'Z' extensions are named, they must be ordered first by
+ *category, then alphabetically within a category.
+ *
+ * 4. Standard supervisor-level extensions (starting with 'S') must be listed
+ *after standard unprivileged extensions.  If multiple supervisor-level
+ *extensions are listed, they must be ordered alphabetically.
+ *
+ * 5. Standard machine-level extensions (starting with 'Zxm') must be listed
+ *after any lower-privileged, standard extensions.  If multiple
+ *machine-level extensions are listed, they must be ordered
+ *alphabetically.
+ *
+ * 6. Non-standard extensions (starting with 'X') must be listed after all
+ *standard extensions. If multiple non-standard extensions are listed, they
+ *must be ordered alphabetically.
+ *
+ * An example string following the order is:
+ *rv64imadc_zifoo_zigoo_zafoo_sbar_scar_zxmbaz_xqux_xrux
+ *
+ * New entries to this struct should follow the ordering rules described above.
+ *
+ * Extension name must be all lowercase (according to device-tree binding)
+ * and strncmp() is used in match_isa_ext() to compare extension names instead
+ * of strncasecmp().
+ */
+const struct riscv_isa_ext_data __initconst riscv_isa_ext[] = {
+RISCV_ISA_EXT_DATA(i),
+RISCV_ISA_EXT_DATA(m),
+RISCV_ISA_EXT_DATA(a),
+RISCV_ISA_EXT_DATA(f),
+RISCV_ISA_EXT_DATA(d),
+RISCV_ISA_EXT_DATA(q),
+RISCV_ISA_EXT_DATA(c),
+RISCV_ISA_EXT_DATA(h),
+RISCV_ISA_EXT_DATA(zicntr),
+RISCV_ISA_EXT_DATA(zicsr),
+RISCV_ISA_EXT_DATA(zifencei),
+RISCV_ISA_EXT_DATA(zihintpause),
+RISCV_ISA_EXT_DATA(zihpm),
+RISCV_ISA_EXT_DATA(zbb),

No Zba and Zbs here, despite there now being enumerators for them?


Missed to add them here. Now someone could try to ask if they are supported
by a CPU as we have that in enumerators. I will add it then.




+static int __init riscv_isa_parse_string(const char *isa,
+ unsigned long *out_bitmap)
+{
+if ( (isa[0] != 'r') && (isa[1] != 'v') )
+return -EINVAL;
+
+#if defined(CONFIG_RISCV_32)
+if ( isa[2] != '3' && isa[3] != '2' )
+return -EINVAL;
+#elif defined(CONFIG_RISCV_64)
+if ( isa[2] != '6' && isa[3] != '4' )
+return -EINVAL;
+#else
+# error "unsupported RISC-V bitness"
+#endif
+
+/*
+ * In unpriv. specification (*_20240411) is mentioned the following:
+ * (1) A RISC-V ISA is defined as a base integer ISA, which must be
+ * present in any implementation, plus optional extensions to
+ * the base ISA.
+ * (2) Chapter 6 describes the RV32E and RV64E subset variants of
+ * the RV32I or RV64I base instruction sets respectively, which
+ * have been added to support small microcontrollers, and which
+ * have half the number of integer registers.
+ *
+ * What means that isa should contain, at least, I or E.
+ *
+ * As Xen isn't expected to be run on microcontrollers and according
+ * to device tree binding the first extension should be "i".
+ */
+if ( isa[4] != 'i' )
+return -EINVAL;
+
+isa += 4;
+
+while ( *isa )
+{
+const char *ext = isa++;
+const char *ext_end = isa;
+
+switch ( *ext )
+{
+case 'x':
+printk_once("Vendor extensions are ignored in riscv,isa\n");
+/*
+ * To s

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Valentin Schneider
On 18/02/25 16:39, Dave Hansen wrote:
> On 2/18/25 14:40, Valentin Schneider wrote:
>>> In practice, it's mostly limited like that.
>>>
>>> Architecturally, there are no promises from the CPU. It is within its
>>> rights to cache anything from the page tables at any time. If it's in
>>> the CR3 tree, it's fair game.
>>>
>> So what if the VMEMMAP range *isn't* in the CR3 tree when a CPU is
>> executing in userspace?
>>
>> AIUI that's the case with kPTI - the remaining kernel pages should mostly
>> be .entry.text and cpu_entry_area, at least for x86.
>
> Having part of VMEMMAP not in the CR3 tree should be harmless while
> running userspace. VMEMMAP is a purely software structure; the hardware
> doesn't do implicit supervisor accesses to it. It's also not needed in
> super early entry.
>
> Maybe I missed part of the discussion though. Is VMEMMAP your only
> concern? I would have guessed that the more generic vmalloc()
> functionality would be harder to pin down.

Urgh, that'll teach me to send emails that late - I did indeed mean the
vmalloc() range, not at all VMEMMAP. IIUC *neither* are present in the user
kPTI page table and AFAICT the page table swap is done before the actual vmap'd
stack (CONFIG_VMAP_STACK=y) gets used.




Re: [PATCH v3 1/2] xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

2025-02-19 Thread Luca Fancellu


> On 19 Feb 2025, at 13:30, Jan Beulich  wrote:
> 
> On 19.02.2025 14:06, Luca Fancellu wrote:
>>> On 19 Feb 2025, at 12:45, Jan Beulich  wrote:
>>> On 18.02.2025 10:51, Luca Fancellu wrote:
 --- a/xen/include/xen/iommu.h
 +++ b/xen/include/xen/iommu.h
 @@ -110,6 +110,8 @@ extern int8_t iommu_hwdom_reserved;
 
 extern unsigned int iommu_dev_iotlb_timeout;
 
 +#ifdef CONFIG_HAS_PASSTHROUGH
 +
 int iommu_setup(void);
 int iommu_hardware_setup(void);
 
 @@ -122,6 +124,28 @@ int arch_iommu_domain_init(struct domain *d);
 void arch_iommu_check_autotranslated_hwdom(struct domain *d);
 void arch_iommu_hwdom_init(struct domain *d);
 
 +#else
 +
 +static inline int iommu_setup(void)
 +{
 +return -ENODEV;
 +}
 +
 +static inline int iommu_domain_init(struct domain *d, unsigned int opts)
 +{
 +/*
 + * When !HAS_PASSTHROUGH, iommu_enabled is set to false and the 
 expected
 + * behaviour of this function is to return success in that case.
 + */
 +return 0;
 +}
>>> 
>>> Hmm. Would the function be anywhere near likely to do anything else than
>>> what it's expected to do? My original concern here was with "opts"
>>> perhaps asking for something that cannot be supported. But that was wrong
>>> thinking on my part. Here what you do is effectively open-code what the
>>> real iommu_domain_init() would do: Return success when !is_iommu_enabled().
>>> Which in turn follows from !iommu_enabled when !HAS_PASSTHROUGH.
>>> 
>>> On that basis I'd be okay if the comment was dropped again. Else it imo
>>> wants re-wording to get closer to the explanation above.
>> 
>> Would it be ok for you a comment saying:
>> “This stub returns the same as the real iommu_domain_init()
>> function: success when !is_iommu_enabled(), which value is based on 
>> iommu_enabled
>> that is false when !HAS_PASSTHROUGH"
> 
> I'm sorry, but this is too verbose for my taste. What's wrong with the more
> terse
> 
> "Return as the real iommu_domain_init() would: Success when
> !is_iommu_enabled(), following from !iommu_enabled when !HAS_PASSTHROUGH"
> 
> ?

nothing wrong, I’ll use that, thanks for confirming.

> 
 @@ -383,12 +429,12 @@ struct domain_iommu {
 #define iommu_set_feature(d, f)   set_bit(f, dom_iommu(d)->features)
 #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features)
 
 +#ifdef CONFIG_HAS_PASSTHROUGH
 /* Are we using the domain P2M table as its IOMMU pagetable? */
 #define iommu_use_hap_pt(d)   (IS_ENABLED(CONFIG_HVM) && \
   dom_iommu(d)->hap_pt_share)
 
 /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
 -#ifdef CONFIG_HAS_PASSTHROUGH
 #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
 
 int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
>>> 
>>> Coming back to my v2 comment: Why exactly is this change needed here? If
>>> things build fine without the macro being defined when !HAS_PASSTHROUGH,
>>> surely they will also build fine with it being defined?
>> 
>> I’ve defined an empty stub on an header included only on MPU systems for the
>> p2m module, this is why it is building
> 
> But that wasn't part of the patch, was it? I.e. with this series alone
> applied, things still don't build?

yes, before building there are other bits needed, this is only a small step 
towards that.

> 
>> I didn’t modify p2m_set_way_flush() which lives in arm common code, because
>> it will be used also on MPU systems (R82 has MPU at EL2 but MMU/MPU at EL1)
>> and I would like to stay the same and be used by MMU/MPU subsystems.
>> 
>>> As per the
>>> respective revlog entry, this change looks to belong into whatever is
>>> going to be done to deal with the one Arm use of the macro. Or maybe
>>> it's unneeded altogether.
>> 
>> I didn’t understand that you were opposing to protecting iommu_use_hap_pt() 
>> when
>> !HAS_PASSTHROUGH, I thought you were referring only to the stub in the #else
>> branch.
>> Can I ask why?
> 
> Sure. And no, I'm not against the extra protection. I'm against unnecessary
> code churn. That is, any such a re-arrangement wants to have some kind of
> justification.

ok, yes the justification is that MPU system will be built with 
!HAS_PASSTHROUGH,
but there is a common function (p2m_set_way_flush) to MMU/MPU subsystem that
I would like to keep common, to do so I have to hide the macro in this 
particular
configuration and afterwards I have two choices:

1) provide a stub implementation on the Arm side
2) provide a stub implementation in iommu.h

number 2 felt better because it could be applied on any Xen configuration 
without
HAS_PASSTHROUGH, even if at the moment there is only MPU.

Number 1 let the possibility for the specific configuration to choose what to 
do in absence
of HAS_PASSTHROUGH.

Now I would like your view on what would be accep

Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Jan Beulich
On 19.02.2025 16:40, Oleksandr Andrushchenko wrote:
> On 19.02.25 16:05, Jan Beulich wrote:
>> On 19.02.2025 14:52, Oleksandr Andrushchenko wrote:
>>> On 19.02.25 15:18, Jan Beulich wrote:
 On 19.02.2025 13:39, Oleksandr Andrushchenko wrote:
> On 17.02.25 12:20, Jan Beulich wrote:
>> On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:
>>> @@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct 
>>> serial_port *port)
>>> if ( ns16550_ioport_invalid(uart) )
>>> return -EIO;
>>> 
>>> -return ( (ns_read_reg(uart, UART_LSR) &
>>> -  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 
>>> 0;
>>> +return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
>>> uart->lsr_mask)
>>> +   ? uart->fifo_size
>>> +   : 0;
>> Indentation of the ? and : lines is clearly wrong here? What is the tool
>> doing?
> There are number of options that have influence on this formatting:
> AllowShortBlocksOnASingleLine [4]
> BreakBeforeTernaryOperators [5]
> AlignOperands [6]
>
> I was not able to tweak these options to have the previous form.
 Right, sticking to the original form (with just the stray blanks zapped)
 would of course be best. Yet again - the tool is doing more transformations
 despite there not being any need. If, however, it does so, then one of my
 expectations would be that the ? and : are properly indented:

   return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == 
 uart->lsr_mask)
  ? uart->fifo_size
  : 0;
>>> This only differs from what the tool is doing by the fact it applies
>>> the following rule: *IndentWidth: 4*, e.g. it has indented your construct
>>> by 4 spaces, see [1]. Which, IMO, is acceptable change.
>> I don't view this as acceptable. It falls in the same class then as
>>
>>  ns_write_reg(uart,
>>   UART_FCR,
>>   UART_FCR_ENABLE | UART_FCR_CLRX | UART_FCR_CLTX |
>>   UART_FCR_TRG14);
>>
>> that I also commented on in my initial reply.
> Ok, then how would you have it defined in the coding style as a rule?
> Such a diversity in defining indentation? So, will you have a dedicated
> rule for the ternary?

Well, this feels like you're returning a request I made your way, elsewhere.
Our present, unwritten rule for wrapping lines is to match the earlier
line's indentation (or the start of the expression), plus accounting for any
pending open parentheses, braces, or brackets. Hence why some consider this
form

 ns_write_reg(uart,
  UART_FCR,
  (UART_FCR_ENABLE | UART_FCR_CLRX | UART_FCR_CLTX |
   UART_FCR_TRG14));

preferable, as some tools (iirc e.g. Andrew indicated his editor does) then
are capable of inferring the intended indentation from the pending open
parentheses.

 That's not overly neat wrapping, but in line with our style. If the other
 form was demanded going forward, I'd be curious how you'd verbally
 describe the requirement in ./CODING_STYLE.
>>> I believe this can be stated around the fact that we need to indent,
>>> e.g. apply the same rule as for other constructs already in use
>> Except here the tool didn't merely adjust indentation, but moved tokens
>> between lines.
> Again, if it moves, but doesn't break the style - then it is going to happen
> only once while applying big-scary-patch.

As to that patch: To some degree I actually like the idea of following Linux
in generally not allowing style-only patches.

>>> @@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 
>>> *uart)
>>> #ifdef NS16550_PCI
>>> if ( uart->bar && uart->io_base >= 0x1 )
>>> {
>>> -pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
>>> -  uart->ps_bdf[2]),
>>> - PCI_COMMAND, PCI_COMMAND_MEMORY);
>>> +pci_conf_write16(
>>> +PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], 
>>> uart->ps_bdf[2]),
>>> +PCI_COMMAND,
>>> +PCI_COMMAND_MEMORY);
>>> return;
>>> }
>> Hmm, transforming a well-formed block into another well-formed one. No
>> gain? (Same again further down.)
> No, gain from human point of view
> But there is a gain that it is now formatted automatically.
 See above: I'd first like to see a written, textual description for all 
 these
 requirements. After all it needs to be possible for a human to write code
 that the tool then wouldn't try to re-arrange. Which in turn requires that
 the restrictions / constraints on the layout are spelled out.
>>> Agree, the existing coding style document will require some extension:
>>> at least clarifications and addition of the rules not describe

Re: [PATCH v2 1/2] xen/list: avoid UB in list iterators

2025-02-19 Thread Andrew Cooper
On 19/02/2025 2:18 pm, Juergen Gross wrote:
> The list_for_each_entry*() iterators are testing for having reached the
> end of the list in a way which relies on undefined behavior: the
> iterator (being a pointer to the struct of a list element) is advanced
> and only then tested to have reached not the next element, but the list
> head. This results in the list head being addressed via a list element
> pointer, which is undefined, in case the list elements have a higher
> alignment than the list head.
>
> Avoid that by testing for the end of the list before advancing the
> iterator. In case of having reached the end of the list, set the
> iterator to NULL and use that for stopping the loop. This has the
> additional advantage of not leaking the iterator pointing to something
> which isn't a list element past the loop.
>
> There is one case in the Xen code where the iterator is used after
> the loop: it is tested to be non-NULL to indicate the loop has run
> until reaching the end of the list. This case is modified to use the
> iterator being NULL for indicating the end of the list has been
> reached.
>
> Reported-by: Andrew Cooper 
> Signed-off-by: Juergen Gross 
> Release-Acked-by: Oleksii Kurochko 

I agree there's an issue here, but as said before, I do not agree with
this patch.

For starters, bloat-o-meter on a random top-of-tree build says

    add/remove: 8/1 grow/shrink: 112/68 up/down: 4314/-2855 (1459)

which is a horrible overhead for a case where the sequence of
instructions are correct (only the C level types are a problem) and ...

> ---
> No proper Fixes: tag, as this bug predates Xen's git and mercurial
> history.
> V2:
> - fix one use case (Jan Beulich)
> - let list_is_first() return bool, rename parameter (Jan Beulich)
> - paranthesize iterator when used as non-NULL test (Jan Beulich)
> - avoid dereferencing NULL in the safe iterators (Jan Beulich)
> ---
>  xen/drivers/passthrough/x86/hvm.c |   3 +-

... the need for this adjustment being discovered after-the-fact means
it's a very risky change at this juncture in the release.

~Andrew



Re: [PATCH 1/2] code style: Format ns16550 driver

2025-02-19 Thread Oleksandr Andrushchenko




On 19.02.25 16:05, Jan Beulich wrote:

On 19.02.2025 14:52, Oleksandr Andrushchenko wrote:

On 19.02.25 15:18, Jan Beulich wrote:

On 19.02.2025 13:39, Oleksandr Andrushchenko wrote:

On 17.02.25 12:20, Jan Beulich wrote:

On 16.02.2025 11:21, Oleksandr Andrushchenko wrote:

@@ -248,8 +249,9 @@ static int cf_check ns16550_tx_ready(struct serial_port 
*port)
if ( ns16550_ioport_invalid(uart) )
return -EIO;

-return ( (ns_read_reg(uart, UART_LSR) &

-  uart->lsr_mask ) == uart->lsr_mask ) ? uart->fifo_size : 0;
+return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
+   ? uart->fifo_size
+   : 0;

Indentation of the ? and : lines is clearly wrong here? What is the tool
doing?

There are number of options that have influence on this formatting:
AllowShortBlocksOnASingleLine [4]
BreakBeforeTernaryOperators [5]
AlignOperands [6]

I was not able to tweak these options to have the previous form.

Right, sticking to the original form (with just the stray blanks zapped)
would of course be best. Yet again - the tool is doing more transformations
despite there not being any need. If, however, it does so, then one of my
expectations would be that the ? and : are properly indented:

  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
 ? uart->fifo_size
 : 0;

This only differs from what the tool is doing by the fact it applies
the following rule: *IndentWidth: 4*, e.g. it has indented your construct
by 4 spaces, see [1]. Which, IMO, is acceptable change.

I don't view this as acceptable. It falls in the same class then as

 ns_write_reg(uart,
  UART_FCR,
  UART_FCR_ENABLE | UART_FCR_CLRX | UART_FCR_CLTX |
  UART_FCR_TRG14);

that I also commented on in my initial reply.

Ok, then how would you have it defined in the coding style as a rule?
Such a diversity in defining indentation? So, will you have a dedicated
rule for the ternary?



or

  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask)
 ? uart->fifo_size : 0;

or

  return ((ns_read_reg(uart, UART_LSR) & uart->lsr_mask) == uart->lsr_mask
  ? uart->fifo_size
  : 0);

(not going to list more variants which are all okay).

In any event, a fundamental requirement of mine is that such a tool would
only apply adjustments when and where style is actively violated. I.e. in
the case here:

 return ((ns_read_reg(uart, UART_LSR) &
  uart->lsr_mask) == uart->lsr_mask) ? uart->fifo_size : 0;

That's not overly neat wrapping, but in line with our style. If the other
form was demanded going forward, I'd be curious how you'd verbally
describe the requirement in ./CODING_STYLE.

I believe this can be stated around the fact that we need to indent,
e.g. apply the same rule as for other constructs already in use

Except here the tool didn't merely adjust indentation, but moved tokens
between lines.

Again, if it moves, but doesn't break the style - then it is going to happen
only once while applying big-scary-patch.



@@ -275,9 +277,10 @@ static void pci_serial_early_init(struct ns16550 *uart)
#ifdef NS16550_PCI
if ( uart->bar && uart->io_base >= 0x1 )
{
-pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
-  uart->ps_bdf[2]),
- PCI_COMMAND, PCI_COMMAND_MEMORY);
+pci_conf_write16(
+PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1], uart->ps_bdf[2]),
+PCI_COMMAND,
+PCI_COMMAND_MEMORY);
return;
}

Hmm, transforming a well-formed block into another well-formed one. No
gain? (Same again further down.)

No, gain from human point of view
But there is a gain that it is now formatted automatically.

See above: I'd first like to see a written, textual description for all these
requirements. After all it needs to be possible for a human to write code
that the tool then wouldn't try to re-arrange. Which in turn requires that
the restrictions / constraints on the layout are spelled out.

Agree, the existing coding style document will require some extension:
at least clarifications and addition of the rules not described yet.

   I'm not looking
forward to pass all my patches through such a tool. I can write style-
conforming code pretty well, with - of course - occasional oversights,

Which the tool will allow not to have for less accurate developers

I fear I don't understand this reply of yours.

I mean that you can write such a well formatted code without any tool.
But there are others who can't. Then the tool will help others to avoid
code style violations.



   right
now. And that in multiple projects all with different styles. I expect to be
in the position to do so also going forward. This, imo, requires that there
be left some room for variations. Which 

Re: xl create/save throwing errors

2025-02-19 Thread Petr Beneš
On Wed, Feb 19, 2025 at 5:53 PM Petr Beneš  wrote:
>
> On Wed, Feb 19, 2025 at 5:04 PM Petr Beneš  wrote:
> >
> > Hello,
> >
>
> To add more information and observations:

Even more observations. This is from a run where 4 vms (4 VCPUs each)
were being created in parallel:

```
Saving to /clones/win10-18362-102/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 14, type x86 HVM
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 14:Failed
to connect to QMP socket /var/run/xen/qmp-libxl-14: No such file or
directory
libxl: error: libxl_dom_save.c:246:switch_qemu_xen_logdirty_done:
Domain 14:logdirty switch failed (rc=-3), abandoning suspend
xc: error: Couldn't disable qemu log-dirty mode (0 = Success): Internal error
xc: error: Failed to clean up (0 = Success): Internal error
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 14:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 14:Failed
to connect to QMP socket /var/run/xen/qmp-libxl-14: No such file or
directory
libxl: error: libxl_dom_suspend.c:610:dm_resume_done: Domain 14:Failed
to resume device model: rc=-3
```

But... running afterwards:

```
# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 16384 6 r- 475.1
win10-18362-102 17  2048 4 -b  30.8
```

And:
```
# lsa /var/run/xen/
total 16K
drwxr-xr-x  2 root root  160 Feb 19 17:13 .
drwxr-xr-x 36 root root 1.1K Feb 19 17:07 ..
-rw-r--r--  1 root root   28 Feb 19 17:13 domid-history
-rw---  1 root root4 Feb 19 17:06 qemu-dom0.pid
srwxr-xr-x  1 root root0 Feb 19 17:13 qmp-libxenstat-17
srwxr-xr-x  1 root root0 Feb 19 17:13 qmp-libxl-17
-rw---  1 root root5 Feb 19 17:06 xenconsoled.pid
-rw-r-  1 root root4 Feb 19 17:06 xenstored.pid
```

The logs complain about a domain ID 14, however, the domain ID of the
win10-18362-102 is later observed to be 17.

P.



[PATCH] xen/arm: Create GIC node using the node name from host dt

2025-02-19 Thread Michal Orzel
At the moment the GIC node we create for hwdom has a name
"interrupt-controller". Change it so that we use the same name as the
GIC node from host device tree. This is done for at least 2 purposes:
1) The convention in DT spec is that a node name with "reg" property
is formed "node-name@unit-address".
2) With DT overlay feature, many overlays refer to the GIC node using
the symbol under __symbols__ that we copy to hwdom 1:1. With the name
changed, the symbol is no longer valid and requires error prone manual
change by the user.

The unit-address part of the node name always refers to the first
address in the "reg" property which in case of GIC, always refers to
GICD and hwdom uses host memory layout.

Signed-off-by: Michal Orzel 
---
 xen/arch/arm/domain_build.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 7b47abade196..e760198d8609 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1615,6 +1615,7 @@ static int __init make_gic_node(const struct domain *d, 
void *fdt,
 int res = 0;
 const void *addrcells, *sizecells;
 u32 addrcells_len, sizecells_len;
+const char *name;
 
 /*
  * Xen currently supports only a single GIC. Discard any secondary
@@ -1628,7 +1629,11 @@ static int __init make_gic_node(const struct domain *d, 
void *fdt,
 
 dt_dprintk("Create gic node\n");
 
-res = fdt_begin_node(fdt, "interrupt-controller");
+/* Use the same name as the GIC node in host device tree */
+name = strrchr(gic->full_name, '/');
+name = name ? name + 1 : gic->full_name;
+
+res = fdt_begin_node(fdt, name);
 if ( res )
 return res;
 
-- 
2.25.1




Re: [PATCH for 4.21 v6 1/2] xen/riscv: drop CONFIG_RISCV_ISA_RV64G

2025-02-19 Thread Oleksii Kurochko


On 2/18/25 6:03 PM, Jan Beulich wrote:

--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -6,8 +6,13 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
  riscv-abi-$(CONFIG_RISCV_32) := -mabi=ilp32
  riscv-abi-$(CONFIG_RISCV_64) := -mabi=lp64
  
-riscv-march-$(CONFIG_RISCV_ISA_RV64G) := rv64g

-riscv-march-$(CONFIG_RISCV_ISA_C)   := $(riscv-march-y)c
+riscv-march-$(CONFIG_RISCV_64) := rv64
+
+riscv-march-y := $(riscv-march-y)ima
+
+riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c
+
+riscv-march-y := $(riscv-march-y)_zicsr_zifencei

The repeated use of := makes this longer than necessary, and hence harder to
read. I understand using += isn't exactly ideal either, because then on the rhs
no blanks may appear (aiui), being kind of against our style and potentially
hampering readability. Still maybe:

riscv-march-$(CONFIG_RISCV_64) := rv64
riscv-march-y+=ima
riscv-march-$(CONFIG_RISCV_ISA_C)+=c
riscv-march-y+=_zicsr_zifencei

?


Btw, I think that we will still anyway strip spaces added by '+='. So it will 
also need to do something like:
  [1] riscv-generic-flags := $(riscv-abi-y) -march=$(subst 
$(space),,$(riscv-march-y))

As without this I expect that -march will look like:
  -march=rv64 ima c _zicsr_zifencei

With the change [1] we could have spaces around "+=":
  riscv-march-y += ima
  riscv-march-$(CONFIG_RISCV_ISA_C) += c
  riscv-march-y += _zicsr_zifencei

  riscv-generic-flags := $(riscv-abi-y) -march=$(subst 
$(space),,$(riscv-march-y))

~ Oleksii


Re: [BUG?] Wrong RC reported during 'make install'

2025-02-19 Thread Andrew Cooper
On 13/02/2025 7:54 am, Jan Beulich wrote:
> On 13.02.2025 01:51, Andrew Cooper wrote:
>> On 12/02/2025 9:52 pm, Stefano Stabellini wrote:
>>> On Wed, 12 Feb 2025, Oleksii Kurochko wrote:
 Hello everyone,

 During the installation of Xen on an ARM server machine from the source 
 code,
 I found that the wrong release candidate (rc) is being used:
   $ make install  
 install -m0644 -p xen //boot/xen-4.20-rc  
 install: cannot remove ‘//boot/xen-4.20-rc’: Permission denied  
 make[1]: *** [Makefile:507: _install] Error 1
 My expectation is that it should be xen-4.20-rc4.

 I'm not sure if this behavior is intentional or if users are expected to 
 set
 the XEN_VENDORVERSION variable manually to ensure the correct release
 candidate number.

 In my opinion, we should set the proper release candidate number after
 "xen-4.20-rc" automatically.

 Does anyone have any thoughts or suggestions on how to resolve this issue?
>>> Hi Oleksii,
>>>
>>> I did a quick test and I see exactly the same on x86 as well. This patch
>>> fixes it, but then it would need someone to update the RC number in
>>> xen/Makefile every time a new RC is made.
>>>
>>> ---
>>> xen: add RC version number to xen filename
>>>
>>> Signed-off-by: Stefano Stabellini 
>> This is a direct consequence of the request to keep XEN_EXTRAVERSION at
>> "-rc" throughout the release cycle.
>>
>> I'm having to manually edit that simply to create the tarballs
>> correctly, which in turn means that the tarball isn't a byte-for-byte
>> identical `git archive` of the tag it purports to be.
> Just for my understanding - may I ask why this editing is necessary?
> Other release technicians never mentioned the (indeed undesirable)
> need to do so.

I did point it out.  I also needed to get RC1 cut and everyone had left
the office.

xen.git$ make src-tarball-release && tar tf dist/xen-4.20-rc.tar.gz | head

Source tarball in /home/andrew/xen.git/dist/xen-4.20-rc.tar.gz
xen-4.20-rc/
xen-4.20-rc/.github/
xen-4.20-rc/.github/workflows/
xen-4.20-rc/.github/workflows/coverity.yml
xen-4.20-rc/.gitarchive-info
xen-4.20-rc/Makefile
xen-4.20-rc/stubdom/
xen-4.20-rc/stubdom/Makefile
xen-4.20-rc/stubdom/grub/
xen-4.20-rc/stubdom/grub/Makefile

mktarball uses `$(MAKE) -C xen xenversion` which uses XEN_EXTRAVERSION.

XEN_EXTRAVERSION needs both the .0 and the RC number in order to make
the tarball with the correct name and correct top directory.

What I didn't anticipate was that, while editing XEN_EXTRAVERSION
locally gets a proper tarball, the contents within the tarball are
nonspecific as to the RC, hence Oleksii's observation.

It also means the tarball wasn't a straight `git archive` of the tree,
which is one of the reasons behind taking out the sub-repos.
>> I'd not twigged that it mean the builds from the tarballs reported false
>> information too.
>>
>> While I appreciate the wish to not have a commit per RC bumping
>> XEN_EXTRAVERSION, I think the avoidance of doing so is creating more
>> problems than it solves, and we should revert back to the prior way of
>> doing things.
> Sure, if it truly is getting in the way, then it needs re-considering.
> Just to mention it: Then the question is going to be though whether
> really to merely adjust XEN_EXTRAVERSION, or whether instead to do
> this consistently in all (three?) places.

It's only XEN_EXTRAVERSION which needs to change (I think).

I think README and SUPPORT.md are fine to say as they are, for
generically -rc.


Oleksii has asked for RC5, and we're overdue.  I'm intending to commit:

diff --git a/xen/Makefile b/xen/Makefile
index 65b460e2b480..4e37fff92514 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
 # All other places this is stored (eg. compile.h) should be autogenerated.
 export XEN_VERSION   = 4
 export XEN_SUBVERSION    = 20
-export XEN_EXTRAVERSION ?= -rc$(XEN_VENDORVERSION)
+export XEN_EXTRAVERSION ?= .0-rc5$(XEN_VENDORVERSION)
 export XEN_FULLVERSION   =
$(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
 -include xen-version
 
in order to make that happen properly, and finally have the tarball be a
straight `git archive` invocation.

Does this sound acceptable?

~Andrew



  1   2   >