[PATCH] x86/hpet: do local APIC EOI after interrupt processing

2025-08-05 Thread Roger Pau Monne
The current logic in the HPET interrupt ->ack() hook will perform a local APIC EOI ahead of enabling interrupts, possibly leading to recursion in the interrupt handler. Fix this by doing the local APIC EOI strictly after the window with interrupt enabled, as that prevents the recursion, and would

[PATCH v4 2/8] pdx: provide a unified set of unit functions

2025-08-05 Thread Roger Pau Monne
The current setup (pdx_init_mask() and pdx_region_mask()) and init (pfn_pdx_hole_setup()) PDX compression functions are tailored to the existing PDX compression algorithm. In preparation for introducing a new compression algorithm convert the setup and init functions to more generic interfaces tha

[PATCH v4 1/8] kconfig: turn PDX compression into a choice

2025-08-05 Thread Roger Pau Monne
Rename the current CONFIG_PDX_COMPRESSION to CONFIG_PDX_MASK_COMPRESSION, and make it part of the PDX compression choice block, in preparation for adding further PDX compression algorithms. The PDX compression defaults should still be the same for all architectures, however the choice block cannot

[PATCH v4 7/8] pdx: introduce a new compression algorithm based on region offsets

2025-08-05 Thread Roger Pau Monne
With the appearance of Intel Sierra Forest and Granite Rapids it's now possible to get a production x86 host with the following memory map: SRAT: Node 0 PXM 0 [, 7fff] SRAT: Node 0 PXM 0 [0001, 00807fff] SRAT: Node 1 PXM 1 [063e8000, 06be

[PATCH v4 4/8] pdx: allow per-arch optimization of PDX conversion helpers

2025-08-05 Thread Roger Pau Monne
There are four performance critical PDX conversion helpers that do the PFN to/from PDX and the physical addresses to/from directmap offsets translations. In the absence of an active PDX compression, those functions would still do the calculations needed, just to return the same input value as no t

[PATCH v4 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

2025-08-05 Thread Roger Pau Monne
There's a loop in arch_init_memory() that iterates over holes and non-RAM regions to possibly mark any page_info structures matching those addresses as IO. The looping there is done over the PFN space. PFNs not covered by the PDX space will always fail the mfn_valid() check, hence re-write the lo

[PATCH v4 6/8] pdx: move some helpers in preparation for new compression

2025-08-05 Thread Roger Pau Monne
Move fill_mask(), pdx_region_mask() and pdx_init_mask() to the !CONFIG_PDX_NONE section in preparation of them also being used by a newly added PDX compression. No functional change intended. Signed-off-by: Roger Pau Monné Acked-by: Jan Beulich --- git is not very helpful when generating the di

[PATCH v4 5/8] test/pdx: add PDX compression unit tests

2025-08-05 Thread Roger Pau Monne
Introduce a set of unit tests for PDX compression. The unit tests contains both real and crafted memory maps that are then compressed using the selected PDX algorithm. Note the build system for the unit tests has been done in a way to support adding new compression algorithms easily. That requir

[PATCH v4 3/8] pdx: introduce command line compression toggle

2025-08-05 Thread Roger Pau Monne
Introduce a command line option to allow disabling PDX compression. The disabling is done by turning pfn_pdx_add_region() into a no-op, so when attempting to initialize the selected compression algorithm the array of ranges to compress is empty. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Be

[PATCH v4 0/8] pdx: introduce a new compression algorithm

2025-08-05 Thread Roger Pau Monne
pdx values. Thanks, Roger. Roger Pau Monne (8): kconfig: turn PDX compression into a choice pdx: provide a unified set of unit functions pdx: introduce command line compression toggle pdx: allow per-arch optimization of PDX conversion helpers test/pdx: add PDX compression unit tests

[PATCH v3 4/8] pdx: allow per-arch optimization of PDX conversion helpers

2025-07-24 Thread Roger Pau Monne
There are four performance critical PDX conversion helpers that do the PFN to/from PDX and the physical addresses to/from directmap offsets translations. In the absence of an active PDX compression, those functions would still do the calculations needed, just to return the same input value as no t

[PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets

2025-07-24 Thread Roger Pau Monne
With the appearance of Intel Sierra Forest and Granite Rapids it's now possible to get a production x86 host with the following memory map: SRAT: Node 0 PXM 0 [, 7fff] SRAT: Node 0 PXM 0 [0001, 00807fff] SRAT: Node 1 PXM 1 [063e8000, 06be

[PATCH v3 1/8] kconfig: turn PDX compression into a choice

2025-07-24 Thread Roger Pau Monne
Rename the current CONFIG_PDX_COMPRESSION to CONFIG_PDX_MASK_COMPRESSION, and make it part of the PDX compression choice block, in preparation for adding further PDX compression algorithms. The PDX compression defaults should still be the same for all architectures, however the choice block cannot

[PATCH v3 3/8] pdx: introduce command line compression toggle

2025-07-24 Thread Roger Pau Monne
Introduce a command line option to allow disabling PDX compression. The disabling is done by turning pfn_pdx_add_region() into a no-op, so when attempting to initialize the selected compression algorithm the array of ranges to compress is empty. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Be

[PATCH v3 8/8] x86/mm: adjust loop in arch_init_memory() to iterate over the PDX space

2025-07-24 Thread Roger Pau Monne
There's a loop in arch_init_memory() that iterates over holes and non-RAM regions to possibly mark any page_info structures matching those addresses as IO. The looping there is done over the PFN space. PFNs not covered by the PDX space will always fail the mfn_valid() check, hence re-write the lo

[PATCH v3 6/8] pdx: move some helpers in preparation for new compression

2025-07-24 Thread Roger Pau Monne
Move fill_mask(), pdx_region_mask() and pdx_init_mask() to the !CONFIG_PDX_NONE section in preparation of them also being used by a newly added PDX compression. No functional change intended. Signed-off-by: Roger Pau Monné Acked-by: Jan Beulich --- git is not very helpful when generating the di

[PATCH v3 5/8] test/pdx: add PDX compression unit tests

2025-07-24 Thread Roger Pau Monne
Introduce a set of unit tests for PDX compression. The unit tests contains both real and crafted memory maps that are then compressed using the selected PDX algorithm. Note the build system for the unit tests has been done in a way to support adding new compression algorithms easily. That requir

[PATCH v3 2/8] pdx: provide a unified set of unit functions

2025-07-24 Thread Roger Pau Monne
The current setup (pdx_init_mask() and pdx_region_mask()) and init (pfn_pdx_hole_setup()) PDX compression functions are tailored to the existing PDX compression algorithm. In preparation for introducing a new compression algorithm convert the setup and init functions to more generic interfaces tha

[PATCH v3 0/8] pdx: introduce a new compression algorithm

2025-07-24 Thread Roger Pau Monne
values. Thanks, Roger. Roger Pau Monne (8): kconfig: turn PDX compression into a choice pdx: provide a unified set of unit functions pdx: introduce command line compression toggle pdx: allow per-arch optimization of PDX conversion helpers test/pdx: add PDX compression unit tests pdx: move

[PATCH] char/ns16550: avoid additions to NULL pointer

2025-07-24 Thread Roger Pau Monne
Clang UBSAN reports: UBSAN: Undefined behaviour in drivers/char/ns16550.c:124:49 applying non-zero offset 0001 to null pointer And UBSAN: Undefined behaviour in drivers/char/ns16550.c:142:49 applying non-zero offset 0001 to null pointer Move calculation of the MMIO addre

[PATCH] x86/hvmloader: adjust strtoll() to parse hex numbers without 0x prefix

2025-07-23 Thread Roger Pau Monne
The current strtoll() implementation in hvmloader requires hex number to be prefixed with 0x, otherwise strtoll() won't parse them correctly even when calling the function with base == 16. Fix this by not unconditionally setting the base to 10 when the string is not 0 prefixed, this also allows pa

[PATCH] xen/livepatch: fixup relocations to replaced symbols

2025-07-16 Thread Roger Pau Monne
In a livepatch payload relocations will refer to included functions. If that function happens to be a replacement for an existing Xen function, the relocations on the livepatch payload will use the newly introduced symbol, rather than the old one. This is usually fine, but if the result of the re

[PATCH 2/2] x86/ept: batch PML p2m type-changes into single locked region

2025-07-15 Thread Roger Pau Monne
The current p2m type-change loop in ept_vcpu_flush_pml_buffer() relies on each call to p2m_change_type_one() taking the p2m lock, doing the change and then dropping the lock and flushing the p2m. Instead take the p2m lock outside of the loop, so that calls to gfn_{,un}lock() inside p2m_change_type

[PATCH 1/2] x86/ept: move vmx_domain_flush_pml_buffers() to p2m-ept.c

2025-07-15 Thread Roger Pau Monne
No functional change intended. Signed-off-by: Roger Pau Monné --- xen/arch/x86/hvm/vmx/vmcs.c | 59 + xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 1 - xen/arch/x86/include/asm/hvm/vmx/vmx.h | 1 + xen/arch/x86/

[PATCH 0/2] x86/ept: batch PML type-changes into single locked region

2025-07-15 Thread Roger Pau Monne
cleanup. The patch here doesn't make things better, but I don't think it makes them any worse either. Thanks, Roger. Roger Pau Monne (2): x86/ept: move vmx_domain_flush_pml_buffers() to p2m-ept.c x86/ept: batch PML p2m type-changes into single locked region xen/arch/x86/hvm/

[PATCH] tools/golang: update auto-generated libxl based types

2025-07-02 Thread Roger Pau Monne
As a result of the addition of a new field in libxl domain build info structure the golang types need to be regnerated, this was missing as part of 22650d6054. Regenerate the headers now. Reported-by: Juergen Gross Fixes: 22650d605462 ('x86/hvmloader: select xen platform pci MMIO BAR UC or WB M

[PATCH v2 4/8] pdx: introduce command line compression toggle

2025-06-20 Thread Roger Pau Monne
Introduce a command line option to allow disabling PDX compression. The disabling is done by turning pfn_pdx_add_region() into a no-op, so when attempting to initialize the selected compression algorithm the array of ranges to compress is empty. Signed-off-by: Roger Pau Monné --- Changes since v

[PATCH v2 6/8] test/pdx: add PDX compression unit tests

2025-06-20 Thread Roger Pau Monne
Introduce a set of unit tests for PDX compression. The unit tests contains both real and crafted memory maps that are then compressed using the selected PDX algorithm. Note the build system for the unit tests has been done in a way to support adding new compression algorithms easily. That requir

[PATCH v2 3/8] pdx: provide a unified set of unit functions

2025-06-20 Thread Roger Pau Monne
The current setup (pdx_init_mask() and pdx_region_mask()) and init (pfn_pdx_hole_setup()) PDX compression functions are tailored to the existing PDX compression algorithm. In preparation for introducing a new compression algorithm convert the setup and init functions to more generic interfaces tha

[PATCH v2 2/8] kconfig: turn PDX compression into a choice

2025-06-20 Thread Roger Pau Monne
Rename the current CONFIG_PDX_COMPRESSION to CONFIG_PDX_MASK_COMPRESSION, and make it part of the PDX compression choice block, in preparation for adding further PDX compression algorithms. No functional change intended as the PDX compression defaults should still be the same for all architectures

[PATCH v2 8/8] pdx: introduce a new compression algorithm based on region offsets

2025-06-20 Thread Roger Pau Monne
With the appearance of Intel Sierra Forest and Granite Rapids it's now possible to get a production x86 host with the following memory map: SRAT: Node 0 PXM 0 [, 7fff] SRAT: Node 0 PXM 0 [0001, 00807fff] SRAT: Node 1 PXM 1 [063e8000, 06be

[PATCH v2 7/8] pdx: move some helpers in preparation for new compression

2025-06-20 Thread Roger Pau Monne
Move fill_mask(), pdx_region_mask() and pdx_init_mask() to the !CONFIG_PDX_NONE section in preparation of them also being used by a newly added PDX compression. No functional change intended. Signed-off-by: Roger Pau Monné --- git is not very helpful when generating the diff here, and it ends up

[PATCH v2 5/8] pdx: allow per-arch optimization of PDX conversion helpers

2025-06-20 Thread Roger Pau Monne
There are four performance critical PDX conversion helpers that do the PFN to/from PDX and the physical addresses to/from directmap offsets translations. In the absence of an active PDX compression, those functions would still do the calculations needed, just to return the same input value as no t

[PATCH v2 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

2025-06-20 Thread Roger Pau Monne
When not using CONFIG_BIGMEM there are some restrictions in the address width for allocations of the domain structure, as it's PDX truncated to 32 bits it's stashed into page_info structure for domain allocated pages. The current logic to calculate this limit is based on the internals of the PDX c

[PATCH v2 0/8] pdx: introduce a new compression algorithm

2025-06-20 Thread Roger Pau Monne
functions and adding a unit test for PDX compression. Patch 8 introduce the new compression. The new compression is only enabled by default on x86, other architectures are left with their previous defaults. Thanks, Roger. Roger Pau Monne (8): x86/pdx: simplify calculation of domain struct allocation

[PATCH v5] x86/hvmloader: select xen platform pci MMIO BAR UC or WB MTRR cache attribute

2025-06-13 Thread Roger Pau Monne
The Xen platform PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't have the functionality of a traditional PCI device. The exposed MMIO BAR is used by some guests (including Linux) as a safe place to map foreign memory, including the grant table itself. Traditionally BARs from devic

[PATCH 8/8] pdx: introduce a command line option for offset compression

2025-06-11 Thread Roger Pau Monne
Allow controlling whether to attempt PDX compression, and which algorithm to use to calculate the coefficients. Document the option and also add a CHANGELOG entry for the newly added feature. Note the work has been originally done to cope with the new Intel Sapphire/Granite Rapids, however the co

[PATCH 6/8] pdx: introduce a new compression algorithm based on offsets between regions

2025-06-11 Thread Roger Pau Monne
With the appearance of Intel Sierra Forest and Granite Rapids it's not possible to get a production x86 host wit the following memory map: SRAT: Node 0 PXM 0 [, 7fff] SRAT: Node 0 PXM 0 [0001, 00407fff] SRAT: Node 1 PXM 1 [061e8000, 065e7

[PATCH 2/8] pdx: introduce function to calculate max PFN based on PDX compression

2025-06-11 Thread Roger Pau Monne
This is the code already present and used by x86 in setup_max_pdx(), which takes into account the current PDX compression, plus the limitation of the virtual memory layout to return the maximum usable PFN in the system, possibly truncating the input PFN provided by the caller. This helper will be

[PATCH 4/8] pdx: provide a unified set of unit functions

2025-06-11 Thread Roger Pau Monne
The current setup (pdx_init_mask() and pdx_region_mask()) and init (pfn_pdx_hole_setup()) PDX compression functions are tailored to the existing PDX compression algorithm. In preparation for introducing a new compression algorithm convert the setup and init functions to more generic interfaces tha

[PATCH 1/8] x86/pdx: simplify calculation of domain struct allocation boundary

2025-06-11 Thread Roger Pau Monne
When not using CONFIG_BIGMEM there are some restrictions in the address width for allocations of the domain structure, as it's PDX truncated to 32bits it's stashed into page_info structure for domain allocated pages. The current logic to calculate this limit is based on the internals of the PDX co

[PATCH 7/8] pdx: introduce translation helpers for offset compression

2025-06-11 Thread Roger Pau Monne
Implement the helpers to translate from pfns or physical addresses into the offset compressed index space. Add a further check in the PDX testing to ensure conversion resulting from the added functions is bi-directional. Signed-off-by: Roger Pau Monné --- tools/tests/pdx/test-pdx-offset.c | 10

[PATCH 5/8] pdx: allow optimizing PDX conversion helpers

2025-06-11 Thread Roger Pau Monne
There are four performance critical PDX conversion helpers that do the PFN to/from PDX and the physical addresses to/from directmap offsets translations. In the absence of an active PDX compression, those functions would still do the calculations needed, just to return the same input value as no t

[PATCH 3/8] kconfig: turn PDX compression into a choice

2025-06-11 Thread Roger Pau Monne
Rename the current CONFIG_PDX_COMPRESSION to CONFIG_PDX_MASK_COMPRESSION, and make it part of the PDX compression choice block, in preparation for adding further PDX compression algorithms. No functional change intended as the PDX compression defaults should still be the same for all architectures

[PATCH 0/8] pdx: introduce a new compression algorithm

2025-06-11 Thread Roger Pau Monne
architectures are left with their previous defaults. Thanks, Roger. Roger Pau Monne (8): x86/pdx: simplify calculation of domain struct allocation boundary pdx: introduce function to calculate max PFN based on PDX compression kconfig: turn PDX compression into a choice pdx: provide a unified set

[PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute

2025-06-10 Thread Roger Pau Monne
The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't have the functionality of a traditional PCI device. The exposed MMIO BAR is used by some guests (including Linux) as a safe place to map foreign memory, including the grant table itself. Traditionally BARs from devices have t

[PATCH v3] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute

2025-06-05 Thread Roger Pau Monne
The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't have the functionality of a traditional PCI device. The exposed MMIO BAR is used by some guests (including Linux) as a safe place to map foreign memory, including the grant table itself. Traditionally BARs from devices have t

[PATCH v2] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute

2025-06-03 Thread Roger Pau Monne
The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't have the functionality of a traditional PCI device. The exposed MIO BAR is used by some guests (including Linux) as a safe place to map foreign memory, including the grant table itself. Traditionally BARs from devices have th

[PATCH] x86/hvmloader: don't set xenpci MMIO BAR as UC in MTRR

2025-05-30 Thread Roger Pau Monne
The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't have the functionality of a traditional PCI device. The exposed MIO BAR is used by some guests (including Linux) as a safe place to map foreign memory, including the grant table itself. Traditionally BARs from devices have th

[PATCH] x86/hvmloader: fix order of PCI vs MTRR initialization

2025-05-27 Thread Roger Pau Monne
After some recent change the order of MTRR vs PCI initialization is inverted. MTRR will get initialization ahead of PCI scanning and sizing of MMIO regions. As a result when setting up MTRRs the MMIO window below 4GB will always have the same size, and there will be no window above 4GB. This resu

[PATCH v3 0/3] x86/boot: provide better diagnostics in AP boot failure

2025-05-23 Thread Roger Pau Monne
correctly. Thanks, Roger. Roger Pau Monne (3): x86/boot: print CPU and APIC ID in bring up failure x86/traps: split code to dump execution state to a separate helper x86/boot: attempt to print trace and panic on AP bring up stall xen/arch/x86/include/asm/processor.h | 1 + xen/arch/x86

[PATCH v3 3/3] x86/boot: attempt to print trace and panic on AP bring up stall

2025-05-23 Thread Roger Pau Monne
With the current AP bring up code, Xen can get stuck indefinitely if an AP freezes during boot after the 'callin' step. Introduce a 5s timeout while waiting for APs to finish startup. On failure of an AP to complete startup, send an NMI to trigger the printing of a stack backtrace on the stuck AP

[PATCH v3 1/3] x86/boot: print CPU and APIC ID in bring up failure

2025-05-23 Thread Roger Pau Monne
Print the CPU and APIC ID that fails to respond to the init sequence, or that didn't manage to reach the "callin" state. Expand a bit the printed error messages. Otherwise the "Not responding." message is not easy to understand by users. Reported-by: Andrew Cooper Signed-off-by: Roger Pau Monné

[PATCH v3 2/3] x86/traps: split code to dump execution state to a separate helper

2025-05-23 Thread Roger Pau Monne
Split the code that triggers remote CPUs to dump stacks into a separate function. Also introduce a parameter that can be set by the caller of the newly introduced function to force CPUs to dump the full stack, rather than just dumping the current function name. No functional change intended. Sig

[PATCH 1/2] x86/vpci: fix off-by-one

2025-05-22 Thread Roger Pau Monne
rangeset_remove_range() uses inclusive ranges, and hence the end of the range should be calculated using PFN_DOWN(), not PFN_UP(). Fixes: 4acab25a9300 ('x86/vpci: fix handling of BAR overlaps with non-hole regions') Signed-off-by: Roger Pau Monné --- xen/arch/x86/pci.c | 2 +- 1 file changed, 1

[PATCH 2/2] x86/vpci: refuse to map BARs at position 0

2025-05-22 Thread Roger Pau Monne
A BAR at position 0 is not initialized (not positioned). While Xen could attempt to map it into the p2m, marking it as mapped will prevent dom0 to change the position of the BAR, as the vPCI code has a shortcomming of not allowing to write to BAR registers while the BAR is mapped on the p2m. Work

[PATCH 0/2] x86/vpci: two fixes

2025-05-22 Thread Roger Pau Monne
Hello, Patch 1 fixes a regression reported by the Qubes ADL runner, so it's required to unblock the testing. Patch 2 is possibly more controversial, it's not strictly required to unblock the testing, but might be good to consider. Thanks, Roger. Roger Pau Monne (2): x86/vpci: fix

[PATCH 2/2] x86/numa: introduce per-NUMA node flush locks

2025-05-22 Thread Roger Pau Monne
Contention around the global flush_lock increases as the amount of physical CPUs on the host also increases. Sadly this doesn't scale on big boxes. However most of the time Xen doesn't require broadcasting flushes to all CPUs on the system, and hence more fine grained (ie: covering less CPUs) lock

[PATCH 0/2] x86/numa: introduce per-node flush_lock

2025-05-22 Thread Roger Pau Monne
t of contention around it. First patch is a preparatory change to allow using per-NUMA node locks, second patch introduces a per-node flush_lock. Thanks, Roger. Roger Pau Monne (2): x86/numa: add per-node lock profile objects x86/numa: introduce per-NUMA node flush locks tools/misc/xenlock

[PATCH 1/2] x86/numa: add per-node lock profile objects

2025-05-22 Thread Roger Pau Monne
Add some basic infrastructure to be able to use lockprofile with per NUMA node locks. This patch just introduces the required types, plus the printing of the data for the newly introduced type. There's no user of per NUMA node locks introduced here. Signed-off-by: Roger Pau Monné --- tools/mis

[PATCH v2 0/4] x86/boot: provide better diagnostics in AP boot failure

2025-05-22 Thread Roger Pau Monne
working correctly. Thanks, Roger. Roger Pau Monne (4): x86/boot: print CPU and APIC ID in bring up failure x86/traps: remove smp_mb() ahead of IPI dispatch x86/traps: split code to dump execution state to a separate helper x86/boot: attempt to print trace and panic on AP bring up stall

[PATCH v2 4/4] x86/boot: attempt to print trace and panic on AP bring up stall

2025-05-22 Thread Roger Pau Monne
With the current AP bring up code, Xen can get stuck indefinitely if an AP freezes during boot after the 'callin' step. Introduce a 5s timeout while waiting for APs to finish startup. On failure of an AP to complete startup, send an NMI to trigger the printing of a stack backtrace on the stuck AP

[PATCH v2 1/4] x86/boot: print CPU and APIC ID in bring up failure

2025-05-22 Thread Roger Pau Monne
Print the CPU and APIC ID that fails to respond to the init sequence, or that didn't manage to reach the "callin" state. Expand a bit the printed error messages. Otherwise the "Not responding." message is not easy to understand by users. Reported-by: Andrew Cooper Signed-off-by: Roger Pau Monné

[PATCH v2 3/4] x86/traps: split code to dump execution state to a separate helper

2025-05-22 Thread Roger Pau Monne
Split the code that triggers remote CPUs to dump stacks into a separate function. Also introduce a parameter that can be set by the caller of the newly introduced function to force CPUs to dump the full stack, rather than just dumping the current function name. No functional change intended. Sig

[PATCH v2 2/4] x86/traps: remove smp_mb() ahead of IPI dispatch

2025-05-22 Thread Roger Pau Monne
The IPI dispatch functions should already have the required barriers to ensure correct memory ordering. Note other callers of send_IPI_mask() don't use any barriers. Reported-by: Andrew Cooper Signed-off-by: Roger Pau Monné --- Changes since v1: - New in this version. --- xen/arch/x86/traps.c

[PATCH 0/2] x86/boot: provide better diagnostics in AP boot failure

2025-05-21 Thread Roger Pau Monne
Hello, Both patches attempt to improve AP boot failure diagnosis by improving the printed failure messages (patch 1) and detecting AP getting stuck during bringup (patch 2). They should be non-functional changes for systems working correctly. Thanks, Roger. Roger Pau Monne (2): x86/boot

[PATCH 1/2] x86/boot: print CPU number in bring up failure

2025-05-21 Thread Roger Pau Monne
Print the CPU ID that fails to respond to the init sequence, or that didn't manage to reach the "callin" state. Expand a bit the printed error messages. Otherwise the "Not responding." message is not easy to understand by users. Reported-by: Andrew Cooper Signed-off-by: Roger Pau Monné --- xe

[PATCH 2/2] x86/boot: attempt to print trace and panic on AP bring up stall

2025-05-21 Thread Roger Pau Monne
With the current AP bring up code Xen can get stuck indefinitely if an AP freezes during boot after the 'callin' step. Introduce a 10s timeout while waiting for APs to finish startup. On failure of an AP to complete startup send an NMI to trigger the printing of a stack backtrace on the stuck AP

[PATCH v3] x86/gnttab: do not implement GNTTABOP_cache_flush

2025-05-21 Thread Roger Pau Monne
The current underlying implementation of GNTTABOP_cache_flush on x86 won't work as expected. The provided {clean,invalidate}_dcache_va_range() helpers only do a local pCPU cache flush, so the cache of previous pCPUs where the vCPU might have run are not flushed. However instead of attempting to f

[PATCH v2 6/6] x86/hvm: reduce the need to flush caches in memory_type_changed()

2025-05-16 Thread Roger Pau Monne
The current cache flushing done in memory_type_changed() is too wide, and this doesn't scale on boxes with high number of CPUs. Attempt to limit cache flushes as a result of p2m type changes, and only do them if: * The CPU doesn't support (or has broken) self-snoop capability, otherwise there

[PATCH v2 5/6] x86/hvm: limit memory type cache flush to running domains

2025-05-16 Thread Roger Pau Monne
Avoid the cache flush if the domain is not yet running. There shouldn't be any cached data resulting from domain accesses that need flushing, as the domain hasn't run yet. No change in domain observable behavior intended. Signed-off-by: Roger Pau Monné --- Changes since v1: - New in this versi

[PATCH v2 4/6] xen/x86: account for assigned PCI devices in cache_flush_permitted()

2025-05-16 Thread Roger Pau Monne
While unlikely, it's possible for PCI devices to not have any IO resources assigned, yet in such case the owner domain might still need to issue cache control operations in case the device performs DMA requests. Adjust cache_flush_permitted() to account for has_arch_pdevs(). While there also swit

[PATCH v2 2/6] x86/gnttab: do not implement GNTTABOP_cache_flush

2025-05-16 Thread Roger Pau Monne
The current underlying implementation of GNTTABOP_cache_flush on x86 won't work as expected. The provided {clean,invalidate}_dcache_va_range() helpers only do a local pCPU cache flush, so the cache of previous pCPUs where the vCPU might have run are not flushed. However instead of attempting to f

[PATCH v2 3/6] xen/x86: rename cache_flush_permitted() to has_arch_io_resources()

2025-05-16 Thread Roger Pau Monne
To better describe the underlying implementation. Define cache_flush_permitted() as an alias of has_arch_io_resources(), so that current users of cache_flush_permitted() are not effectively modified. With the introduction of the new handler, change some of the call sites of cache_flush_permitted(

[PATCH v2 0/6] xen: cache control improvements

2025-05-16 Thread Roger Pau Monne
Hello, Following series contain some fixes for cache control operations, the main focus is to reduce the load on big systems when cache control operations are executed. Patches 1-4 are bugfixes, while patches 5 and 6 are improvements to the current code. Thanks, Roger. Roger Pau Monne (6

[PATCH v2 1/6] x86/pv: fix emulation of wb{,no}invd to flush all pCPU caches

2025-05-16 Thread Roger Pau Monne
The current emulation of wb{,no}invd is bogus for PV guests: it will only flush the current pCPU cache, without taking into account pCPUs where the vCPU had run previously. Resort to flushing the cache on all host pCPUs to make it correct. Fixes: 799fed0a7cc5 ("Priv-op emulation in Xen, for RDMSR

[PATCH v2] x86/vpci: fix handling of BAR overlaps with non-hole regions

2025-05-16 Thread Roger Pau Monne
For once the message printed when a BAR overlaps with a non-hole regions is not accurate on x86. While the BAR won't be mapped by the vPCI logic, it is quite likely overlapping with a reserved region in the memory map, and already mapped as by default all reserved regions are identity mapped in th

[PATCH] x86/iommu: use rangeset_subtract() in arch_iommu_hwdom_init()

2025-05-15 Thread Roger Pau Monne
Remove an open-coded instance of rangeset_subtract(). No functional change intended. Signed-off-by: Roger Pau Monné --- xen/drivers/passthrough/x86/iommu.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthroug

[PATCH] x86/vpci: fix handling of BAR overlaps with non-hole regions

2025-05-15 Thread Roger Pau Monne
For once the message printed when a BAR overlaps with a non-hole regions is not accurate on x86. While the BAR won't be mapped by the vPCI logic, it is quite likely overlapping with a reserved region in the memory map, and already mapped as by default all reserved regions are identity mapped in th

[PATCH] xen: enable XEN_UNPOPULATED_ALLOC as part of xen.config

2025-05-14 Thread Roger Pau Monne
PVH dom0 is useless without XEN_UNPOPULATED_ALLOC, as otherwise it will very likely balloon out all dom0 memory to map foreign and grant pages. Enable it by default as part of xen.config. This also requires enabling MEMORY_HOTREMOVE and ZONE_DEVICE. Signed-off-by: Roger Pau Monné --- kernel/co

[PATCH] xen/x86: fix initial memory balloon target

2025-05-14 Thread Roger Pau Monne
When adding extra memory regions as ballooned pages also adjust the balloon target, otherwise when the balloon driver is started it will populate memory to match the target value and consume all the extra memory regions added. This made the usage of the Xen `dom0_mem=,max:` command line parameter

[PATCH 4/9] x86/gnttab: do not implement GNTTABOP_cache_flush

2025-05-06 Thread Roger Pau Monne
The current underlying implementation of GNTTABOP_cache_flush on x86 won't work as expected. The provided {clean,invalidate}_dcache_va_range() helpers only do a local pCPU cache flush, so the cache of previous pCPUs where the vCPU might have run are not flushed. However instead of attempting to f

[PATCH 7/9] xen/x86: rename cache_flush_permitted() to has_arch_io_resources()

2025-05-06 Thread Roger Pau Monne
To better describe the underlying implementation. Define cache_flush_permitted() as an alias of has_arch_io_resources(), so that current users of cache_flush_permitted() are not effectively modified. With the introduction of the new handler, change some of the call sites of cache_flush_permitted(

[PATCH 9/9] xen/x86: track dirty pCPU caches for a given vCPU

2025-05-06 Thread Roger Pau Monne
When a guest is allowed access to cache control operations such tracking prevents having to issue a system-wide cache flush, and rather just flush the pCPUs where the vCPU has been scheduled since the last flush. Note that domain-wide flushes accumulate the dirty caches from all the vCPUs, but cle

[PATCH 2/9] x86/pv: fix emulation of wb{,no}invd to flush all pCPU caches

2025-05-06 Thread Roger Pau Monne
The current emulation of wb{,no}invd is bogus for PV guests: it will only flush the current pCPU cache, without taking into account pCPUs where the vCPU had run previously. Since there's no tracking of dirty cache pCPUs currently, resort to flushing the cache on all host pCPUs. Also as a result o

[PATCH 5/9] x86/mtrr: use memory_type_changed() in hvm_set_mem_pinned_cacheattr()

2025-05-06 Thread Roger Pau Monne
The current logic partially open-codes memory_type_changed(), but doesn't check whether the type change or the cache flush is actually needed. Instead switch to using memory_type_changed(), at possibly a higher expense cost of not exclusively issuing cache flushes when limiting cacheability. Howev

[PATCH 0/8] xen: cache control improvements

2025-05-06 Thread Roger Pau Monne
avoid having to broadcast cache flushes on all pCPUs on x86. Thanks, Roger. Roger Pau Monne (9): x86/pv: fix MMUEXT_FLUSH_CACHE to flush all pCPU caches x86/pv: fix emulation of wb{,no}invd to flush all pCPU caches xen/gnttab: limit cache flush operation to guests allowed cache control

[PATCH 3/9] xen/gnttab: limit cache flush operation to guests allowed cache control

2025-05-06 Thread Roger Pau Monne
Whether a domain is allowed to issue cache-control operations is reported by the cache_flush_permitted() check. Introduce such check to limit the availability of GNTTABOP_cache_flush to only guests that are granted cache control. Fixes: 18e8d22fe750 ("introduce GNTTABOP_cache_flush") Signed-off-b

[PATCH 8/9] xen: introduce flag when a domain requires cache control

2025-05-06 Thread Roger Pau Monne
Such flag is added to the domain create hypercall, and a matching option is added to xl and libxl to set the flag: `cache_control`. When the flag is set, the domain is allowed the usage of cache control operations. If the flag is not explicitly set, libxl will set it if the domain has any `iomem`

[PATCH 6/9] x86/p2m: limit cache flush in memory_type_changed()

2025-05-06 Thread Roger Pau Monne
Only do the cache flush when there's a p2m type change to propagate, otherwise there's no change in the p2m effective caching attributes. If the p2m memory_type_changed hook is not set p2m_memory_type_changed() is a no-op, no recalculation of caching attributes is needed, nor flushing of the previ

[PATCH 1/9] x86/pv: fix MMUEXT_FLUSH_CACHE to flush all pCPU caches

2025-05-06 Thread Roger Pau Monne
The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to flush the cache of any previous pCPU where the current vCPU might have run, and hence is likely to not work as expected. Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL, which will be correct in al

[PATCH v4 4/4] x86/mm: move mmio_ro_emulated_write() to PV only file

2025-04-29 Thread Roger Pau Monne
mmio_ro_emulated_write() is only used in pv/ro-page-fault.c, move the function to that file and make it static. No functional change intended. Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich --- xen/arch/x86/include/asm/mm.h | 12 -- xen/arch/x86/mm.c | 33 -

[PATCH v4 2/4] x86/hvm: fix handling of accesses to partial r/o MMIO pages

2025-04-29 Thread Roger Pau Monne
The current logic to handle accesses to MMIO pages partially read-only is based on the (now removed) logic used to handle accesses to the r/o MMCFG region(s) for PVH v1 dom0. However that has issues when running on AMD hardware, as in that case the guest linear address that triggered the fault is

[PATCH v4 0/4] xen/x86: fix implementation of subpage r/o MMIO

2025-04-29 Thread Roger Pau Monne
the HVM subpage handler when needed. Finally patch 4 moves some PV only code to a PV specific file. Thanks, Roger. Roger Pau Monne (4): xen/io: provide helpers for multi size MMIO accesses x86/hvm: fix handling of accesses to partial r/o MMIO pages x86/hvm: only register the r/o subpage ops

[PATCH v4 3/4] x86/hvm: only register the r/o subpage ops when needed

2025-04-29 Thread Roger Pau Monne
MMIO operation handlers can be expensive to process, hence attempt to register only those that will be needed by the domain. Subpage r/o MMIO regions are added exclusively at boot, further limit their addition to strictly before the initial domain gets created, so by the time initial domain creati

[PATCH v4 1/4] xen/io: provide helpers for multi size MMIO accesses

2025-04-29 Thread Roger Pau Monne
Several handlers have the same necessity of reading or writing from or to an MMIO region using 1, 2, 4 or 8 bytes accesses. So far this has been open-coded in the function itself. Instead provide a new set of handlers that encapsulate the accesses. Since the added helpers are not architecture sp

[PATCH v2] xen: fix buffer over-read in bitmap_to_xenctl_bitmap()

2025-04-25 Thread Roger Pau Monne
There's an off-by-one when calculating the last byte in the input array to bitmap_to_xenctl_bitmap(), which leads to bitmaps with sizes multiple of 8 to over-read and incorrectly use a byte past the end of the array. Fixes: 288c4641c80d ('xen: simplify bitmap_to_xenctl_bitmap for little endian') S

[PATCH] x86/hvmloader: fix usage of NULL with cpuid_count()

2025-04-24 Thread Roger Pau Monne
The commit that added support for retrieving the APIC IDs from the APs introduced several usages of cpuid() with NULL parameters, which is not handled by the underlying implementation. For GCC I expect this results in writes to the physical address at 0, however for Clang the generated code in smp

[PATCH] xen: fix buffer over-read in bitmap_to_xenctl_bitmap()

2025-04-24 Thread Roger Pau Monne
There's an off-by-one when calculating the last byte in the input array to bitmap_to_xenctl_bitmap(), which leads to bitmaps with sizes multiple of 8 to over-read and incorrectly use a byte past the end of the array. While there also ensure that bitmap_to_xenctl_bitmap() is not called with a bitma

[PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata

2025-04-23 Thread Roger Pau Monne
There are several errata on Intel regarding the usage of the MONITOR/MWAIT instructions, all having in common that stores to the monitored region might not wake up the CPU. Fix them by forcing the sending of an IPI for the affected models. The Ice Lake issue has been reproduced internally on XenS

  1   2   3   4   5   6   7   8   9   10   >