Re: [BUG] x2apic broken with current AMD hardware
On 16.03.2023 23:03, Elliott Mitchell wrote: > On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote: >> On 11.03.2023 01:09, Elliott Mitchell wrote: >>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote: In any event you will want to collect a serial log at maximum verbosity. It would also be of interest to know whether turning off the IOMMU avoids the issue as well (on the assumption that your system has less than 255 CPUs). >>> >>> I think I might have figured out the situation in a different fashion. >>> >>> I was taking a look at the BIOS manual for this motherboard and noticed >>> a mention of a "Local APIC Mode" setting. Four values are listed >>> "Compatibility", "xAPIC", "x2APIC", and "Auto". >>> >>> That is the sort of setting I likely left at "Auto" and that may well >>> result in x2 functionality being disabled. Perhaps the x2APIC >>> functionality on AMD is detecting whether the hardware is present, and >>> failing to test whether it has been enabled? (could be useful to output >>> a message suggesting enabling the hardware feature) >> >> Can we please move to a little more technical terms here? What is "present" >> and "enabled" in your view? I don't suppose you mean the CPUID bit (which >> we check) and the x2APIC-mode-enable one (which we drive as needed). It's >> also left unclear what the four modes of BIOS operation evaluate to. Even >> if we knew that, overriding e.g. "Compatibility" (which likely means some >> form of "disabled" / "hidden") isn't normally an appropriate thing to do. >> In "Auto" mode Xen likely should work - the only way I could interpret the >> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and >> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre- >> enabled by firmware, and "Auto" leaving it to the OS to select. Yet that's >> speculation on my part ... > > I provided the information I had discovered. There is a setting for this > motherboard (likely present on some similar motherboards) which /may/ > effect the issue. I doubt I've tried "compatibility", but none of the > values I've tried have gotten the system to boot without "x2apic=false" > on Xen's command-line. > > When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended Features:" > I see the line "(XEN) - x2APIC". Later is the line > "(XEN) x2APIC mode is already enabled by BIOS." I'll guess "Auto" > leaves the x2APIC turned off since neither line is present. When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC mode. Are you sure that's the case when using "Auto"? > Both cases the line "(XEN) Switched to APIC driver x2apic_cluster" is > present (so perhaps "Auto" merely doesn't activate it). Did you also try "x2apic_phys" on the Xen command line (just to be sure this isn't a clustered-mode only issue)? > Appears error_interrupt() needs locking or some concurrency handling > mechanism since the last error is jumbled. With the setting "x2APIC" > I get a bunch of: > "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)" > (apparently one for each core) > Followed by "Receive accept error, Receive accept error," (again, > apparently one for each core). Then a bunch of newlines (same pattern). This is a known issue, but since the messages shouldn't appear in the first place so far no-one has bothered to address this. > With the setting "auto" the last message is a series of > "(XEN) CPU#: No irq handler for vector ## (IRQ -2147483648, LAPIC)" on > 2 different cores. Rather more of the lines were from one core, the > vector value varied some. > > Notable both sets of final error messages appeared after the Domain 0 > kernel thought it had been operating for >30 seconds. Lack of > response to interrupt generating events (pressing keys on USB keyboard) > suggests interrupts weren't getting through. > > > With "x2apic=false" error messages similar to the "Local APIC Mode" > of "x2APIC" appear >45 seconds after Domain 0 kernel start. Of note > first "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)" > appears for all cores with "Receive accept error,". > > Yet later a variation on this message starts appearing: > "(XEN) APIC error on CPU#: 08(08)(XEN) APIC error on CPU#: 08(08)" > this one appears multiple times. That's certainly odd, as it suggests that things were okay for a short while. > If one was to want full logs, the lack of secure communications channel > would be an issue (since filtering out identifying data is difficult). > DSA-3072 with SHA2-256 is now less than wonderful, but DSA-1024 and > ElGamal 2048 are right out. I think how good or bad my publicly available key is doesn't matter here at all. You're not asked to provide the logs to me or any other individual; you're asked to provide the logs to the community, such that anyone interested may take a stab at addressing the issue. Eventual patches may also want to refer to (parts of) such log
Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains
On 16.03.2023 23:59, Stefano Stabellini wrote: > On Thu, 16 Mar 2023, Jan Beulich wrote: >> On 16.03.2023 11:26, Michal Orzel wrote: >>> --- a/xen/drivers/char/console.c >>> +++ b/xen/drivers/char/console.c >>> @@ -490,7 +490,24 @@ static void switch_serial_input(void) >>> } >>> else >>> { >>> -console_rx++; >>> +unsigned int next_rx = console_rx + 1; >>> + >>> +/* Skip switching serial input to non existing domains */ >>> +while ( next_rx < max_init_domid + 1 ) >>> +{ >>> +struct domain *d = rcu_lock_domain_by_id(next_rx - 1); >>> + >>> +if ( d ) >>> +{ >>> +rcu_unlock_domain(d); >>> +break; >>> +} >>> + >>> +next_rx++; >>> +} >>> + >>> +console_rx = next_rx; >>> + >>> printk("*** Serial input to DOM%d", console_rx - 1); >>> } >> >> While at the first glance (when you sent it in reply to v1) it looked okay, >> I'm afraid it really isn't: Please consider what happens when the last of >> the DomU-s doesn't exist anymore. (You don't really check whether it still >> exists, because the range check comes ahead of the existence one.) In that >> case you want to move from second-to-last to Xen. I expect the entire >> if/else construct wants to be inside the loop. > > I don't think we need another loop, just a check if we found a domain or I didn't say "another loop", but I suggested that the loop needs to be around the if/else. Of course this can be transformed into equivalent forms, like ... > not. E.g.: > > > unsigned int next_rx = console_rx + 1; > > /* Skip switching serial input to non existing domains */ > while ( next_rx < max_init_domid + 1 ) > { > struct domain *d = rcu_lock_domain_by_id(next_rx - 1); > > if ( d ) > { > rcu_unlock_domain(d); > console_rx = next_rx; > printk("*** Serial input to DOM%d", console_rx - 1); > break; > } > > next_rx++; > } > > /* no domain found */ > console_rx = 0; > printk("*** Serial input to Xen"); ... what you suggest (or at least almost, because the way it's written we'd always switch to Xen). Jan
Re: [PATCH v3 4/6] vpci: restrict unhandled read/write operations for guests
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote: > From: Oleksandr Andrushchenko > > A guest would be able to read and write those registers which are not > emulated and have no respective vPCI handlers, so it will be possible > for it to access the hardware directly. > In order to prevent a guest from reads and writes from/to the unhandled > registers make sure only hardware domain can access the hardware directly > and restrict guests from doing so. > > Suggested-by: Roger Pau Monné > Signed-off-by: Oleksandr Andrushchenko > > --- > > v3: > - No changes > > Older comments from another series: > > Since v6: > - do not use is_hwdom parameter for vpci_{read|write}_hw and use > current->domain internally > - update commit message > New in v6 > Moved into another series > --- > xen/drivers/vpci/vpci.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > index 5232f9605b..199ff55672 100644 > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -220,6 +220,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned > int reg, > { > uint32_t data; > > +/* Guest domains are not allowed to read real hardware. */ > +if ( !is_hardware_domain(current->domain) ) > +return ~(uint32_t)0; > + > switch ( size ) > { > case 4: > @@ -260,9 +264,13 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned > int reg, > return data; > } > > -static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int > size, > - uint32_t data) > +static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, > + unsigned int size, uint32_t data) Unrelated change? The parameter list doesn't go over 80 characters so this rearranging is not necessary, and in any case should be done in a separate commit or at least mentioned in the commit log. > { > +/* Guest domains are not allowed to write real hardware. */ I would maybe write this as: "Unprivileged domain are not allowed unhandled accesses to the config space." But that's mostly a nit, and would also apply to the comment in vpci_read_hw(). Thanks, Roger.
Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On 17.03.2023 00:19, Stefano Stabellini wrote: > On Thu, 16 Mar 2023, Jan Beulich wrote: >> So yes, it then all boils down to that Linux- >> internal question. > > Excellent question but we'll have to wait for Ray as he is the one with > access to the hardware. But I have this data I can share in the > meantime: > > [1.260378] IRQ to pin mappings: > [1.260387] IRQ1 -> 0:1 > [1.260395] IRQ2 -> 0:2 > [1.260403] IRQ3 -> 0:3 > [1.260410] IRQ4 -> 0:4 > [1.260418] IRQ5 -> 0:5 > [1.260425] IRQ6 -> 0:6 > [1.260432] IRQ7 -> 0:7 > [1.260440] IRQ8 -> 0:8 > [1.260447] IRQ9 -> 0:9 > [1.260455] IRQ10 -> 0:10 > [1.260462] IRQ11 -> 0:11 > [1.260470] IRQ12 -> 0:12 > [1.260478] IRQ13 -> 0:13 > [1.260485] IRQ14 -> 0:14 > [1.260493] IRQ15 -> 0:15 > [1.260505] IRQ106 -> 1:8 > [1.260513] IRQ112 -> 1:4 > [1.260521] IRQ116 -> 1:13 > [1.260529] IRQ117 -> 1:14 > [1.260537] IRQ118 -> 1:15 > [1.260544] done. And what does Linux think are IRQs 16 ... 105? Have you compared with Linux running baremetal on the same hardware? Jan > And I think Ray traced the point in Linux where Linux gives us an IRQ == > 112 (which is the one causing issues): > > __acpi_register_gsi-> > acpi_register_gsi_ioapic-> > mp_map_gsi_to_irq-> > mp_map_pin_to_irq-> > __irq_resolve_mapping() > > if (likely(data)) { > desc = irq_data_to_desc(data); > if (irq) > *irq = data->irq; > /* this IRQ is 112, IO-APIC-34 domain */ > }
Re: [PATCH v3 5/6] vpci: use reference counter to protect vpci state
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote: > vPCI MMIO handlers are accessing pdevs without protecting this > access with pcidevs_{lock|unlock}. This is not a problem as of now > as these are only used by Dom0. But, towards vPCI is used also for > guests, we need to properly protect pdev and pdev->vpci from being > removed while still in use. > > For that use pdev reference counting. I wonder whether vPCI does need to take another reference to the device. This all stems from me not having it fully clear how the reference counting is supposed to be used for pdevs. As mentioned in a previous patch, I would expect device assignation to take a reference, and hence vPCI won't need to take an extra refcount since vPCI can only be used once the device has been assigned to a domain, and hence already has at least an extra reference taken from the fact it's assigned to a domain. If anything I would add an ASSERT(pdev->refcount > 1) or equivalent. > > Signed-off-by: Volodymyr Babchuk > Suggested-by: Jan Beulich > > --- > > v3: > - Moved from another patch series > --- > xen/drivers/vpci/vpci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > index 199ff55672..005f38dc77 100644 > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -62,6 +62,7 @@ void vpci_remove_device(struct pci_dev *pdev) > xfree(pdev->vpci->msi); > xfree(pdev->vpci); > pdev->vpci = NULL; > +pcidev_put(pdev); > } > > int vpci_add_handlers(struct pci_dev *pdev) > @@ -72,6 +73,8 @@ int vpci_add_handlers(struct pci_dev *pdev) > if ( !has_vpci(pdev->domain) ) > return 0; > > +pcidev_get(pdev); > + > /* We should not get here twice for the same device. */ > ASSERT(!pdev->vpci); You are missing a pcidev_put() in case allocation of pdev->vpci fails. Thanks, Roger.
Re: [PATCH 4/4] Update README to state Python3 requirement
On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote: > Python2 is not supported anymore. There are two things here which concern me: For one, how come this is at the end of a series? You want to keep in mind that any series may be committed piecemeal (unless an indication to the contrary is in the cover letter, but there's none here in the first place). The other aspect is that there's no indication here of it being consensus that we raise the baseline requirement for Python, and for Python alone. A decision towards the wider topic of raising baseline requirements is, as you may recall from the meeting in Cambridge, still pending. Jan
Re: [PATCH v3 6/6] xen: pci: print reference counter when dumping pci_devs
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote: > This can be handy during new reference counter approach evaluation. > > Signed-off-by: Volodymyr Babchuk > > --- > > v3: > - Moved from another patch series > --- > xen/drivers/passthrough/pci.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index b32382aca0..1eb79e7d01 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -1275,7 +1275,8 @@ static int cf_check _dump_pci_devices(struct pci_seg > *pseg, void *arg) > else > #endif > printk("%pd", pdev->domain); > -printk(" - node %-3d", (pdev->node != NUMA_NO_NODE) ? pdev->node : > -1); > +printk(" - node %-3d refcnt %d", (pdev->node != NUMA_NO_NODE) ? > pdev->node : -1, This line is now too long (> 80 chars), you need to add a newline between the format and the argument list. The rest LGTM. Thanks, Roger.
Re: [PATCH v8 1/5] xen: introduce CONFIG_GENERIC_BUG_FRAME
On Thu, 2023-03-16 at 12:26 +0100, Jan Beulich wrote: > On 15.03.2023 18:21, Oleksii Kurochko wrote: > > --- /dev/null > > +++ b/xen/common/bug.c > > @@ -0,0 +1,108 @@ > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > I actually meant to also ask: What is this needed for? Glancing over > the > code ... > > > +/* > > + * Returns a negative value in case of an error otherwise > > + * BUGFRAME_{run_fn, warn, bug, assert} > > + */ > > +int do_bug_frame(struct cpu_user_regs *regs, unsigned long pc) > > +{ > > + const struct bug_frame *bug = NULL; > > + const struct virtual_region *region; > > + const char *prefix = "", *filename, *predicate; > > + unsigned long fixup; > > + unsigned int id, lineno; > > + > > + region = find_text_region(pc); > > + if ( !region ) > > + return -EINVAL; > > + > > + for ( id = 0; id < BUGFRAME_NR; id++ ) > > + { > > + const struct bug_frame *b; > > + size_t i; > > + > > + for ( i = 0, b = region->frame[id].bugs; > > + i < region->frame[id].n_bugs; b++, i++ ) > > + { > > + if ( bug_loc(b) == pc ) > > + { > > + bug = b; > > + goto found; > > + } > > + } > > + } > > + > > + found: > > + if ( !bug ) > > + return -ENOENT; > > + > > + if ( id == BUGFRAME_run_fn ) > > + { > > + void (*fn)(struct cpu_user_regs *) = bug_ptr(bug); > > + > > + fn(regs); > > + > > + /* Re-enforce consistent types, because of the casts > > involved. */ > > + if ( false ) > > + run_in_exception_handler(fn); > > + > > + return id; > > + } > > + > > + /* WARN, BUG or ASSERT: decode the filename pointer and line > > number. */ > > + filename = bug_ptr(bug); > > + if ( !is_kernel(filename) && !is_patch(filename) ) > > + return -EINVAL; > > + fixup = strlen(filename); > > + if ( fixup > 50 ) > > + { > > + filename += fixup - 47; > > + prefix = "..."; > > + } > > + lineno = bug_line(bug); > > + > > + switch ( id ) > > + { > > + case BUGFRAME_warn: > > + printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno); > > + show_execution_state(regs); > > + > > + break; > > + > > + case BUGFRAME_bug: > > + printk("Xen BUG at %s%s:%d\n", prefix, filename, lineno); > > + > > + if ( BUG_DEBUGGER_TRAP_FATAL(regs) ) > > + break; > > + > > + show_execution_state(regs); > > + panic("Xen BUG at %s%s:%d\n", prefix, filename, lineno); > > + > > + case BUGFRAME_assert: > > + /* ASSERT: decode the predicate string pointer. */ > > + predicate = bug_msg(bug); > > + if ( !is_kernel(predicate) && !is_patch(predicate) ) > > + predicate = ""; > > + > > + printk("Assertion '%s' failed at %s%s:%d\n", > > + predicate, prefix, filename, lineno); > > + > > + if ( BUG_DEBUGGER_TRAP_FATAL(regs) ) > > + break; > > + > > + show_execution_state(regs); > > + panic("Assertion '%s' failed at %s%s:%d\n", > > + predicate, prefix, filename, lineno); > > + } > > + > > + return id; > > +} > > ... I can't really spot what it might be that comes from that header. > Oh, on the N+1st run I've spotted it - it's show_execution_state(). > The declaration of which, already being used from common code ahead > of this series, should imo be moved to a common header. I guess I'll > make yet another patch ... As mentioned above. Not only show_execution_state() but also cpu_user_regs structure. ( at lest, for ARM & RISCV ) ~ Oleksii
Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains
On 17/03/2023 09:36, Jan Beulich wrote: > > > On 16.03.2023 23:59, Stefano Stabellini wrote: >> On Thu, 16 Mar 2023, Jan Beulich wrote: >>> On 16.03.2023 11:26, Michal Orzel wrote: --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -490,7 +490,24 @@ static void switch_serial_input(void) } else { -console_rx++; +unsigned int next_rx = console_rx + 1; + +/* Skip switching serial input to non existing domains */ +while ( next_rx < max_init_domid + 1 ) +{ +struct domain *d = rcu_lock_domain_by_id(next_rx - 1); + +if ( d ) +{ +rcu_unlock_domain(d); +break; +} + +next_rx++; +} + +console_rx = next_rx; + printk("*** Serial input to DOM%d", console_rx - 1); } >>> >>> While at the first glance (when you sent it in reply to v1) it looked okay, >>> I'm afraid it really isn't: Please consider what happens when the last of >>> the DomU-s doesn't exist anymore. (You don't really check whether it still >>> exists, because the range check comes ahead of the existence one.) In that >>> case you want to move from second-to-last to Xen. I expect the entire >>> if/else construct wants to be inside the loop. >> >> I don't think we need another loop, just a check if we found a domain or > > I didn't say "another loop", but I suggested that the loop needs to be > around the if/else. Of course this can be transformed into equivalent > forms, like ... > >> not. E.g.: >> >> >> unsigned int next_rx = console_rx + 1; >> >> /* Skip switching serial input to non existing domains */ >> while ( next_rx < max_init_domid + 1 ) >> { >> struct domain *d = rcu_lock_domain_by_id(next_rx - 1); >> >> if ( d ) >> { >> rcu_unlock_domain(d); >> console_rx = next_rx; >> printk("*** Serial input to DOM%d", console_rx - 1); >> break; >> } >> >> next_rx++; >> } >> >> /* no domain found */ >> console_rx = 0; >> printk("*** Serial input to Xen"); > > ... what you suggest (or at least almost, because the way it's written > we'd always switch to Xen). I would prefer a loop with if/else inside. If you are ok with the following code that handles all the cases, I will push a patch in a minute: diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index 51e5408f2114..96ec3bbcf541 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -483,15 +483,34 @@ struct domain *console_input_domain(void) static void switch_serial_input(void) { -if ( console_rx == max_init_domid + 1 ) -{ -console_rx = 0; -printk("*** Serial input to Xen"); -} -else +unsigned int next_rx = console_rx + 1; + +/* + * Rotate among Xen, dom0 and boot-time created domUs while skipping + * switching serial input to non existing domains. + */ +while ( next_rx <= max_init_domid + 2 ) { -console_rx++; -printk("*** Serial input to DOM%d", console_rx - 1); +if ( next_rx == max_init_domid + 2 ) +{ +console_rx = 0; +printk("*** Serial input to Xen"); +break; +} +else +{ +struct domain *d = rcu_lock_domain_by_id(next_rx - 1); + +if ( d ) +{ +rcu_unlock_domain(d); +console_rx = next_rx; +printk("*** Serial input to DOM%d", console_rx - 1); +break; +} + +next_rx++; +} } if ( switch_code ) ~Michal
Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > On 17.03.2023 00:19, Stefano Stabellini wrote: > > On Thu, 16 Mar 2023, Jan Beulich wrote: > >> So yes, it then all boils down to that Linux- > >> internal question. > > > > Excellent question but we'll have to wait for Ray as he is the one with > > access to the hardware. But I have this data I can share in the > > meantime: > > > > [1.260378] IRQ to pin mappings: > > [1.260387] IRQ1 -> 0:1 > > [1.260395] IRQ2 -> 0:2 > > [1.260403] IRQ3 -> 0:3 > > [1.260410] IRQ4 -> 0:4 > > [1.260418] IRQ5 -> 0:5 > > [1.260425] IRQ6 -> 0:6 > > [1.260432] IRQ7 -> 0:7 > > [1.260440] IRQ8 -> 0:8 > > [1.260447] IRQ9 -> 0:9 > > [1.260455] IRQ10 -> 0:10 > > [1.260462] IRQ11 -> 0:11 > > [1.260470] IRQ12 -> 0:12 > > [1.260478] IRQ13 -> 0:13 > > [1.260485] IRQ14 -> 0:14 > > [1.260493] IRQ15 -> 0:15 > > [1.260505] IRQ106 -> 1:8 > > [1.260513] IRQ112 -> 1:4 > > [1.260521] IRQ116 -> 1:13 > > [1.260529] IRQ117 -> 1:14 > > [1.260537] IRQ118 -> 1:15 > > [1.260544] done. > > And what does Linux think are IRQs 16 ... 105? Have you compared with > Linux running baremetal on the same hardware? So I have some emails from Ray from he time he was looking into this, and on Linux dom0 PVH dmesg there is: [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 0-23 [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 So it seems the vIO-APIC data provided by Xen to dom0 is at least consistent. > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > 112 (which is the one causing issues): > > > > __acpi_register_gsi-> > > acpi_register_gsi_ioapic-> > > mp_map_gsi_to_irq-> > > mp_map_pin_to_irq-> > > __irq_resolve_mapping() > > > > if (likely(data)) { > > desc = irq_data_to_desc(data); > > if (irq) > > *irq = data->irq; > > /* this IRQ is 112, IO-APIC-34 domain */ > > } Could this all be a result of patch 4/5 in the Linux series ("[RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different __acpi_register_gsi hook is installed for PVH in order to setup GSIs using PHYSDEV ops instead of doing it natively from the IO-APIC? FWIW, the introduced function in that patch (acpi_register_gsi_xen_pvh()) seems to unconditionally call acpi_register_gsi_ioapic() without checking if the GSI is already registered, which might lead to multiple IRQs being allocated for the same underlying GSI? As I commented there, I think that approach is wrong. If the GSI has not been mapped in Xen (because dom0 hasn't unmasked the respective IO-APIC pin) we should add some logic in the toolstack to map it before attempting to bind. Thanks, Roger.
Re: [PATCH v3 1/6] xen: add reference counter support
On Thu, Mar 16, 2023 at 05:56:00PM +0100, Jan Beulich wrote: > On 16.03.2023 17:48, Roger Pau Monné wrote: > > On Thu, Mar 16, 2023 at 05:43:18PM +0100, Jan Beulich wrote: > >> On 16.03.2023 17:39, Roger Pau Monné wrote: > >>> On Thu, Mar 16, 2023 at 05:32:38PM +0100, Jan Beulich wrote: > On 16.03.2023 17:19, Roger Pau Monné wrote: > > On Tue, Mar 14, 2023 at 08:56:29PM +, Volodymyr Babchuk wrote: > >> +static inline void refcnt_get(refcnt_t *refcnt) > >> +{ > >> +int old = atomic_add_unless(&refcnt->refcnt, 1, 0); > > > > Occurred to me while looking at the next patch: > > > > Don't you also need to print a warning (and saturate the counter > > maybe?) if old == 0, as that would imply the caller is attempting > > to take a reference of an object that should be destroyed? IOW: it > > would point to some kind of memory leak. > > Hmm, I notice the function presently returns void. I think what to do > when the counter is zero needs leaving to the caller. See e.g. > get_page() which will simply indicate failure to the caller in case > the refcnt is zero. (There overflow handling also is left to the > caller ... All that matters is whether a ref can be acquired.) > >>> > >>> Hm, likely. I guess pages never go away even when it's refcount > >>> reaches 0. > >>> > >>> For the pdev case attempting to take a refcount on an object that has > >>> 0 refcounts implies that the caller is using leaked memory, as the > >>> point an object reaches 0 it supposed to be destroyed. > >> > >> Hmm, my thinking was that a device would remain at refcnt 0 until it is > >> actually removed, i.e. refcnt == 0 being a prereq for pci_remove_device() > >> to be willing to do anything at all. But maybe that's not a viable model. > > > > Right, I think the intention was for pci_remove_device() to drop the > > refcount to 0 and do the removal, so the refcount should be 1 when > > calling pci_remove_device(). But none of this is written down, so > > it's mostly my assumptions from looking at the code. > > Could such work at all? The function can't safely drop a reference > and _then_ check whether it was the last one. The function either has > to take refcnt == 0 as prereq, or it needs to be the destructor > function that refcnt_put() calls. But then you also get in the trouble of asserting that refcnt == 0 doesn't change between evaluation and actual removal of the structure. Should all refcounts to pdev be taken and dropped while holding the pcidevs lock? I there an email (outside of this series) that contains a description of how the refcounting is to be used with pdevs? Thanks, Roger.
Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring
On Fri, Mar 10, 2023 at 10:18:32AM +0100, Roger Pau Monné wrote: > On Fri, Mar 10, 2023 at 10:01:39AM +1100, Michael Ellerman wrote: > > Roger Pau Monné writes: > > > On Mon, Dec 12, 2022 at 01:36:48PM +0100, Roger Pau Monné wrote: > > >> On Fri, Dec 02, 2022 at 12:40:05PM +0100, Roger Pau Monné wrote: > > >> > On Wed, Nov 30, 2022 at 05:08:06PM -0800, Stefano Stabellini wrote: > > >> > > On Wed, 30 Nov 2022, Roger Pau Monne wrote: > > >> > > > The hvc machinery registers both a console and a tty device based > > >> > > > on > > >> > > > the hv ops provided by the specific implementation. Those two > > >> > > > interfaces however have different locks, and there's no single > > >> > > > locks > > >> > > > that's shared between the tty and the console implementations, > > >> > > > hence > > >> > > > the driver needs to protect itself against concurrent accesses. > > >> > > > Otherwise concurrent calls using the split interfaces are likely to > > >> > > > corrupt the ring indexes, leaving the console unusable. > > >> > > > > > >> > > > Introduce a lock to xencons_info to serialize accesses to the > > >> > > > shared > > >> > > > ring. This is only required when using the shared memory console, > > >> > > > concurrent accesses to the hypercall based console implementation > > >> > > > are > > >> > > > not an issue. > > >> > > > > > >> > > > Note the conditional logic in domU_read_console() is slightly > > >> > > > modified > > >> > > > so the notify_daemon() call can be done outside of the locked > > >> > > > region: > > >> > > > it's an hypercall and there's no need for it to be done with the > > >> > > > lock > > >> > > > held. > > >> > > > > >> > > For domU_read_console: I don't mean to block this patch but we need > > >> > > to > > >> > > be sure about the semantics of hv_ops.get_chars. Either it is > > >> > > expected > > >> > > to be already locked, then we definitely shouldn't add another lock > > >> > > to > > >> > > domU_read_console. Or it is not expected to be already locked, then > > >> > > we > > >> > > should add the lock. > > >> > > > > >> > > My impression is that it is expected to be already locked, but I > > >> > > think > > >> > > we need Greg or Jiri to confirm one way or the other. > > >> > > > >> > Let me move both to the 'To:' field then. > > >> > > > >> > My main concern is the usage of hv_ops.get_chars hook in > > >> > hvc_poll_get_char(), as it's not obvious to me that callers of > > >> > tty->poll_get_char hook as returned by tty_find_polling_driver() will > > >> > always do so with the tty lock held (in fact the only user right now > > >> > doesn't seem to hold the tty lock). > > >> > > > >> > > Aside from that the rest looks fine. > > > > > > I guess I could reluctantly remove the lock in the get_chars hook, > > > albeit I'm not convinced at all the lock is not needed there. > > > > I don't know the xen driver, but other HVC backends have a lock around > > their private state in their get_chars() implementations. > > > > See eg. hvterm_raw_get_chars(). > > Yes, that was one of the motivation for adding the lock also here, and > it has already been mentioned. The other is the usage of the hooks by > callers of hvc_poll_get_char(). Ping? Thanks, Roger.
Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
On Thu, Mar 16, 2023 at 04:09:44PM -0700, Stefano Stabellini wrote: > On Thu, 16 Mar 2023, Juergen Gross wrote: > > On 16.03.23 14:53, Alex Deucher wrote: > > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross wrote: > > > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich wrote: > > > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > > hardware > > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > > hardware IOMMU > > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > > current domain is > > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how > > > > > > > > > > can > > > > > > > > > > it get > > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > > to an > > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, > > > > > > > > > there > > > > > > > > > is no > > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > > the IOMMU > > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > > addresses for > > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > > available > > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > > corresponding > > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > > address-restricted > > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > > programmed in, > > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > > buffers > > > > > > > > may fulfill this requirement. > > > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > > than > > > > > > > 4GB. > > > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > > device > > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > > machine > > > > > > > addresses). > > > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > > guest > > > > > > > physical address to use to program the device, the same way it > > > > > > > does > > > > > > > on > > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > > <4GB > > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), > > > > > > > we > > > > > > > could get any address, including >4GB addresses, and that is > > > > > > > expected to > > > > > > > work. > > > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > > swiotlb > > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > > in > > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > > to > > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > > What > > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > > the chip family). > > > > > > > > But the swiotlb isn't per device, but system global. > > > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > > So you get to pick one. > > > > The swiotlb is used only for buffers which are not within the DMA mask of a > > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA > > mask > > won't use the swiotlb unless you have a buffer above guest physical address > > of > > 16TB (so basically never). > > > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > > 32 bit DMA mask passed through to this guest couldn't work with buffers > > above 4GB. > > > > I don't think this is acceptable. > > From the Xen subsystem in Linux point of view, the only thing we need to > do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not > the global swiotlb) on PVH because it is not needed anyway. But this is already the case o
[PATCH] Fix PCI hotplug AML
From: David Woodhouse The emulated PIIX3 uses a nybble for the status of each PCI function, so the status for e.g. slot 0 functions 0 and 1 respectively can be read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04). The AML that Xen gives to a guest gets the operand order for the odd- numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00) instead. As far as I can tell, this was the wrong way round in Xen from the moment that PCI hotplug was first introduced in commit 83d82e6f35a8: +ShiftRight (0x4, \_GPE.PH00, Local1) +Return (Local1) /* IN status as the _STA */ Or maybe there's bizarre AML operand ordering going on there, like Intel's wrong-way-round assembler, and it only broke later when it was changed to being generated? Either way, it's definitely wrong now, and instrumenting a Linux guest shows that it correctly sees _STA being 0x00 in function 0 of an empty slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to look at function 1 and sees that _STA evaluates to 0x04. Thus reporting an adapter is present in every slot in /sys/bus/pci/slots/* Quite why Linux wants to look for function 1 being physically present when function 0 isn't... I don't want to think about right now. Signed-off-by: David Woodhouse Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug") --- Utterly untested in Xen. Tested the same change in a different environment which is using precisely the *same* AML for guest compatibility. diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c index 1176da80ef..1d27809116 100644 --- a/tools/libacpi/mk_dsdt.c +++ b/tools/libacpi/mk_dsdt.c @@ -431,7 +431,7 @@ int main(int argc, char **argv) stmt("Store", "0x89, \\_GPE.DPT2"); } if ( slot & 1 ) -stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1); +stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1); else stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1); stmt("Return", "Local1"); /* IN status as the _STA */ smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2 2/2] tools/misc: Add xen-vcpus-stats tool
On Thu, Feb 23, 2023 at 08:31:29PM +, Julien Grall wrote: > Hi, > > On 23/02/2023 16:01, Andrew Cooper wrote: > > On 07/10/2022 1:39 pm, Matias Ezequiel Vara Larsen wrote: > > > > A couple of observations, all unrelated to the stats themselves. > > > > Although overall, I'm not entirely certain that a tool like this is > > going to be very helpful after initial development. Something to > > consider would be to alter libxenstat to use this new interface? > > > > > diff --git a/tools/misc/Makefile b/tools/misc/Makefile > > > index 2b683819d4..837e4b50da 100644 > > > --- a/tools/misc/Makefile > > > +++ b/tools/misc/Makefile > > > @@ -49,6 +49,7 @@ TARGETS_COPY += xenpvnetboot > > > > > > # Everything which needs to be built > > > TARGETS_BUILD := $(filter-out $(TARGETS_COPY),$(TARGETS_ALL)) > > > +TARGETS_BUILD += xen-vcpus-stats > > > > This patch is whitespace corrupted. If at all possible, you need to see > > about getting `git send-email` working to send patches with, as it deals > > with most of the whitespace problems for you. > > > > I'm afraid you can't simply copy the patch text into an email and send that. > > > > > > > > # ... including build-only targets > > > TARGETS_BUILD-$(CONFIG_X86) += xen-vmtrace > > > @@ -135,4 +136,9 @@ xencov: xencov.o > > > xen-ucode: xen-ucode.o > > > $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) > > > > > > +xen-vcpus-stats.o: CFLAGS += $(CFLAGS_libxenforeginmemory) > > > + > > > +xen-vcpus-stats: xen-vcpus-stats.o > > > + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) > > > $(LDLIBS_libxenforeignmemory) $(APPEND_LDFLAGS) > > > + > > > -include $(DEPS_INCLUDE) > > > diff --git a/tools/misc/xen-vcpus-stats.c b/tools/misc/xen-vcpus-stats.c > > > new file mode 100644 > > > index 00..29d0efb124 > > > --- /dev/null > > > +++ b/tools/misc/xen-vcpus-stats.c > > > @@ -0,0 +1,87 @@ > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include > > > +#include > > > +#include > > > + > > > +#define rmb() asm volatile("lfence":::"memory") > > > > This is rmb(), but rmb() isn't what you want. > > > > You want smp_rmb(), which is > > > > #define smp_rmb() asm volatile ("" ::: "memory") > > From the generic PoV, I find smp_rmb() a bit misleading because it is not > clear in this context whether we are referring to the SMP-ness of the > hypervisor or the tools domain. > > If the latter, then technically it could be uniprocessor domain and one > could argue that for Arm it could be downgraded to just a compiler barrier. > > AFAICT, this would not be the case here because we are getting data from > Xen. So we always need a "dmb ish". > > So, I would suggest to name it virt_*() (to match Linux's naming). > > Also, is this tool meant to be arch-agnostic? If so, then we need to > introduce the proper barrier for the other arch. > Thanks Julien for the comment. Is it `xen_rmb()` meant for that? Matias
[xen-unstable test] 179697: tolerable trouble: fail/pass/starved - PUSHED
flight 179697 xen-unstable real [real] flight 179714 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/179697/ http://logs.test-lab.xenproject.org/osstest/logs/179714/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail pass in 179714-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179668 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179668 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179668 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179668 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179668 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179668 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179668 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179668 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179668 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass build-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-examine 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-xl 1 build-check(1) starved n/a test-armhf-armhf-xl-credit1 1 build-check(1) starved n/a test-armhf-armhf-xl-credit2 1 build-check(1) starved n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) starved n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) starved n/a test-armhf-armhf-xl-rtds 1 build-check(1) starved n/a test-armhf-armhf-xl-vhd 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: xen 36e49fc8cbad21a4308b4701caaa685ef04e120b baseline version: xen b2ea81d2b935474cf27a76b4aa143ae897e82457 Last test of basis 179668 2023-03-16 06:30:55 Z1 days Testing same since 179697 2023-03-16 20:45:20 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Henry Wang Jan Beulich Jason Andryuk jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-
Re: [XEN PATCH v2] x86/monitor: Add new monitor event to catch I/O instructions
On 15/03/2023 6:54 pm, Dmitry Isaykin wrote: > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index a43bcf2e92..49225f48a7 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2939,17 +2939,26 @@ void svm_vmexit_handler(void) > break; > > case VMEXIT_IOIO: > -if ( (vmcb->exitinfo1 & (1u<<2)) == 0 ) > + { > +uint16_t port = (vmcb->exitinfo1 >> 16) & 0x; > +int bytes = ((vmcb->exitinfo1 >> 4) & 0x07); > +int dir = (vmcb->exitinfo1 & 1) ? IOREQ_READ : IOREQ_WRITE; > +bool string_ins = (vmcb->exitinfo1 & (1u<<2)); > +int rc = hvm_monitor_io(port, bytes, dir, string_ins); > +if ( rc < 0 ) > +goto unexpected_exit_type; > +if ( !rc ) > { > -uint16_t port = (vmcb->exitinfo1 >> 16) & 0x; > -int bytes = ((vmcb->exitinfo1 >> 4) & 0x07); > -int dir = (vmcb->exitinfo1 & 1) ? IOREQ_READ : IOREQ_WRITE; > -if ( handle_pio(port, bytes, dir) ) > -__update_guest_eip(regs, vmcb->exitinfo2 - vmcb->rip); > +if ( !string_ins ) > +{ > +if ( handle_pio(port, bytes, dir) ) > +__update_guest_eip(regs, vmcb->exitinfo2 - vmcb->rip); > +} > +else if ( !hvm_emulate_one_insn(x86_insn_is_portio, "port I/O") ) > +hvm_inject_hw_exception(TRAP_gp_fault, 0); > } > -else if ( !hvm_emulate_one_insn(x86_insn_is_portio, "port I/O") ) > -hvm_inject_hw_exception(TRAP_gp_fault, 0); > break; > +} There are a few style issues, but it's also a mess because of the manual exitinfo decoding, so I went ahead and did https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=df9369154aa010b2322e3f3e0727a242784cfd4f to clean it up. The rebased version of this hunk is now: diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index bfe03316def6..17ac99f6cd56 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2939,6 +2939,15 @@ void svm_vmexit_handler(void) break; case VMEXIT_IOIO: + rc = hvm_monitor_io(vmcb->ei.io.port, + vmcb->ei.io.bytes, + vmcb->ei.io.in ? IOREQ_READ : IOREQ_WRITE, + vmcb->ei.io.str); + if ( rc < 0 ) + goto unexpected_exit_type; + if ( rc ) + break; + if ( !vmcb->ei.io.str ) { if ( handle_pio(vmcb->ei.io.port, which I hope you'll agree is much more simple to follow. I'm also trying to sort out a similar cleanup on the Intel side, but haven't managed to post that yet. ~Andrew
[PATCH 0/2] tools/xl: small cleanup of parsing code
2 small patches cleaning up the parsing code in xl a little bit. Juergen Gross (2): tools/xl: allow split_string_into_pair() to trim values tools/xl: rework p9 config parsing tools/xl/xl_parse.c | 114 +--- tools/xl/xl_parse.h | 4 +- 2 files changed, 56 insertions(+), 62 deletions(-) -- 2.35.3
[PATCH 1/2] tools/xl: allow split_string_into_pair() to trim values
Most use cases of split_string_into_pair() are requiring the returned strings to be white space trimmed. In order to avoid the same code pattern multiple times, add a predicate parameter to split_string_into_pair() which can be specified to call trim() with that predicate for the string pair returned. Specifying NULL for the predicate will avoid the call of trim(). Signed-off-by: Juergen Gross --- tools/xl/xl_parse.c | 42 +++--- tools/xl/xl_parse.h | 4 ++-- 2 files changed, 21 insertions(+), 25 deletions(-) diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index 853e9f357a..2f9dfea05c 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -646,7 +646,7 @@ static void parse_vnuma_config(const XLU_Config *config, conf_count++) { if (xlu_cfg_value_type(conf_option) == XLU_STRING) { -char *buf, *option_untrimmed, *value_untrimmed; +char *buf; char *option, *value; unsigned long val; @@ -654,15 +654,12 @@ static void parse_vnuma_config(const XLU_Config *config, if (!buf) continue; -if (split_string_into_pair(buf, "=", - &option_untrimmed, - &value_untrimmed)) { +if (split_string_into_pair(buf, "=", &option, &value, + isspace)) { fprintf(stderr, "xl: failed to split \"%s\" into pair\n", buf); exit(EXIT_FAILURE); } -trim(isspace, option_untrimmed, &option); -trim(isspace, value_untrimmed, &value); if (!strcmp("pnode", option)) { val = parse_ulong(value); @@ -715,8 +712,6 @@ static void parse_vnuma_config(const XLU_Config *config, } free(option); free(value); -free(option_untrimmed); -free(value_untrimmed); } } } @@ -838,7 +833,7 @@ int parse_vdispl_config(libxl_device_vdispl *vdispl, char *token) rc = split_string_into_pair(connectors[i], ":", &vdispl->connectors[i].unique_id, -&resolution); +&resolution, NULL); rc= sscanf(resolution, "%ux%u", &vdispl->connectors[i].width, &vdispl->connectors[i].height); @@ -2292,18 +2287,15 @@ void parse_config_data(const char *config_source, split_string_into_string_list(buf, ",", &pairs); len = libxl_string_list_length(&pairs); for (i = 0; i < len; i++) { -char *key, *key_untrimmed, *value, *value_untrimmed; +char *key, *value; int rc; -rc = split_string_into_pair(pairs[i], "=", -&key_untrimmed, -&value_untrimmed); +rc = split_string_into_pair(pairs[i], "=", &key, &value, +isspace); if (rc != 0) { fprintf(stderr, "failed to parse channel configuration: %s", pairs[i]); exit(1); } -trim(isspace, key_untrimmed, &key); -trim(isspace, value_untrimmed, &value); if (!strcmp(key, "backend")) { replace_string(&chn->backend_domname, value); @@ -2326,9 +2318,7 @@ void parse_config_data(const char *config_source, " ignoring\n", key); } free(key); -free(key_untrimmed); free(value); -free(value_untrimmed); } switch (chn->connection) { case LIBXL_CHANNEL_CONNECTION_UNKNOWN: @@ -2952,10 +2942,8 @@ void trim(char_predicate_t predicate, const char *input, char **output) *output = result; } -int split_string_into_pair(const char *str, - const char *delim, - char **a, - char **b) +int split_string_into_pair(const char *str, const char *delim, + char **a, char **b, char_predicate_t predicate) { char *s, *p, *saveptr, *aa = NULL, *bb = NULL; int rc = 0; @@ -2967,13 +2955,21 @@ int split_string_into_pair(const char *str, rc = ERROR_INVAL; goto out; } -aa = xstrdup(p); +if (predicate) { +trim(predicate, p, &aa); +} else { +aa = xstrdup(p); +} p = strtok_r(NULL, delim, &saveptr); if (p == NULL) { rc = ERROR_INVAL; goto out; } -bb = xstrdup(p); +if
[PATCH 2/2] tools/xl: rework p9 config parsing
Rework the config parsing of a p9 device to use the split_string_into_pair() function instead of open coding it. Signed-off-by: Juergen Gross --- tools/xl/xl_parse.c | 72 ++--- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index 2f9dfea05c..715e14f95f 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2111,54 +2111,52 @@ void parse_config_data(const char *config_source, if (!xlu_cfg_get_list(config, "p9", &p9devs, 0, 0)) { libxl_device_p9 *p9; -char *security_model = NULL; -char *path = NULL; -char *tag = NULL; -char *backend = NULL; -char *p, *p2, *buf2; d_config->num_p9s = 0; d_config->p9s = NULL; while ((buf = xlu_cfg_get_listitem (p9devs, d_config->num_p9s)) != NULL) { +libxl_string_list pairs; +int len; + p9 = ARRAY_EXTEND_INIT(d_config->p9s, d_config->num_p9s, libxl_device_p9_init); libxl_device_p9_init(p9); -buf2 = strdup(buf); -p = strtok(buf2, ","); -if(p) { - do { - while(*p == ' ') - ++p; - if ((p2 = strchr(p, '=')) == NULL) - break; - *p2 = '\0'; - if (!strcmp(p, "security_model")) { - security_model = strdup(p2 + 1); - } else if(!strcmp(p, "path")) { - path = strdup(p2 + 1); - } else if(!strcmp(p, "tag")) { - tag = strdup(p2 + 1); - } else if(!strcmp(p, "backend")) { - backend = strdup(p2 + 1); - } else { - fprintf(stderr, "Unknown string `%s' in 9pfs spec\n", p); - exit(1); - } - } while ((p = strtok(NULL, ",")) != NULL); -} -if (!path || !security_model || !tag) { - fprintf(stderr, "9pfs spec missing required field!\n"); - exit(1); +split_string_into_string_list(buf, ",", &pairs); +len = libxl_string_list_length(&pairs); +for (i = 0; i < len; i++) { +char *key, *value; +int rc; + +rc = split_string_into_pair(pairs[i], "=", &key, &value, +isspace); +if (rc != 0) { +fprintf(stderr, "failed to parse 9pfs configuration: %s", +pairs[i]); +exit(1); +} + +if (!strcmp(key, "security_model")) { +replace_string(&p9->security_model, value); +} else if (!strcmp(key, "path")) { +replace_string(&p9->path, value); +} else if (!strcmp(key, "tag")) { +replace_string(&p9->tag, value); +} else if (!strcmp(key, "backend")) { +replace_string(&p9->backend_domname, value); +} else { +fprintf(stderr, "Unknown 9pfs parameter '%s'\n", key); +exit(1); +} +free(key); +free(value); } -free(buf2); -replace_string(&p9->tag, tag); -replace_string(&p9->security_model, security_model); -replace_string(&p9->path, path); -if (backend) -replace_string(&p9->backend_domname, backend); +if (!p9->path || !p9->security_model || !p9->tag) { +fprintf(stderr, "9pfs spec missing required field!\n"); +exit(1); +} } } -- 2.35.3
Re: [PATCH v2] x86: use POPCNT for hweight() when available
On Mon, Jul 15, 2019 at 02:39:04PM +, Jan Beulich wrote: > This is faster than using the software implementation, and the insn is > available on all half-way recent hardware. Therefore convert > generic_hweight() to out-of-line functions (without affecting Arm) > and use alternatives patching to replace the function calls. > > Note that the approach doesn#t work for clang, due to it not recognizing > -ffixed-*. I've been giving this a look, and I wonder if it would be fine to simply push and pop the scratch registers in the 'call' path of the alternative, as that won't require any specific compiler option. Thanks, Roger.
Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote: > On 26.05.2020 17:01, Andrew Cooper wrote: > > On 26/05/2020 14:35, Jan Beulich wrote: > >> On 26.05.2020 13:17, Andrew Cooper wrote: > >>> On 26/05/2020 07:49, Jan Beulich wrote: > Respective Core Gen10 processor lines are affected, too. > > Signed-off-by: Jan Beulich > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g > case 0x000506e0: /* errata SKL167 / SKW159 */ > case 0x000806e0: /* erratum KBL??? */ > case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */ > +case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */ > +case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */ > >>> This is marred in complexity. > >>> > >>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural > >>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE > >>> no longer exists/works. This applies to IceLake systems, but possibly > >>> not their initial release configuration (hence, via a later microcode > >>> update). > >>> > >>> HLE is also disabled in microcode on all older parts for errata reasons, > >>> so in practice it doesn't exist anywhere now. > >>> > >>> I think it is safe to drop this workaround, and this does seem a more > >>> simple option than encoding which microcode turned HLE off (which sadly > >>> isn't covered by the spec updates, as even when turned off, HLE is still > >>> functioning according to its spec of "may speed things up, may do > >>> nothing"), or the interactions with the CPUID hiding capabilities of > >>> MSR_TSX_CTRL. > >> I'm afraid I don't fully follow: For one, does what you say imply HLE is > >> no longer enumerated in CPUID? > > > > No - sadly not. For reasons of "not repeating the Haswell/Broadwell > > microcode fiasco", the HLE bit will continue to exist and be set. > > (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.) > > > > It was always a weird CPUID bit. You were supposed to put > > XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop > > on old hardware and go faster on newer hardware. > > > > There is nothing runtime code needs to look at the HLE bit for, except > > perhaps for UI reporting purposes. > > Do you know of some public Intel doc I could reference for all of this, > which I would kind of need in the description of a patch ... > > >> But then this > >> erratum does not have the usual text effectively meaning that an ucode > >> update is or will be available to address the issue; instead it says > >> that BIOS or VMM can reserve the respective address range. > > > > This is not surprising at all. Turning off HLE was an unrelated > > activity, and I bet the link went unnoticed. > > > >> This - assuming the alternative you describe is indeed viable - then is > >> surely > >> a much more intrusive workaround than needed. Which I wouldn't assume > >> they would suggest in such a case. > > > > My suggestion was to drop the workaround, not to complicated it with a > > microcode revision matrix. > > ... doing this? I don't think I've seen any of this in writing so far, > except by you. (I don't understand how this reply of yours relates to > what I was saying about the spec update. I understand what you are > suggesting. I merely tried to express that I'd have expected Intel to > point out the much easier workaround, rather than just a pretty involved > one.) Otherwise, may I suggest you make such a patch, to make sure it > has an adequate description? Seeing as there seems to be some data missing to justify the commit - was has Linux done with those erratas? Thanks, Roger.
[XEN PATCH v3] x86/monitor: Add new monitor event to catch I/O instructions
Adds monitor support for I/O instructions. Signed-off-by: Dmitry Isaykin Signed-off-by: Anton Belousov Acked-by: Tamas K Lengyel --- Changes in v3: * Rebase on staging * Refactor branch logic on monitor_traps response Changes in v2: * Handled INS and OUTS instructions too * Added I/O monitoring support for AMD * Rename functions and structures (remove "_instruction" part) * Reorder parameters of hvm_monitor_io to match handle_pio's order * Change type of string_ins parameter to bool * Change vm_event_io structure * Handle monitor_traps's return status --- tools/include/xenctrl.h| 1 + tools/libs/ctrl/xc_monitor.c | 13 + xen/arch/x86/hvm/monitor.c | 21 + xen/arch/x86/hvm/svm/svm.c | 8 xen/arch/x86/hvm/vmx/vmx.c | 21 ++--- xen/arch/x86/include/asm/domain.h | 1 + xen/arch/x86/include/asm/hvm/monitor.h | 3 +++ xen/arch/x86/include/asm/monitor.h | 3 ++- xen/arch/x86/monitor.c | 13 + xen/include/public/domctl.h| 1 + xen/include/public/vm_event.h | 10 ++ 11 files changed, 91 insertions(+), 4 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 23037874d3..05967ecc92 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -2102,6 +2102,7 @@ int xc_monitor_emul_unimplemented(xc_interface *xch, uint32_t domain_id, bool enable); int xc_monitor_vmexit(xc_interface *xch, uint32_t domain_id, bool enable, bool sync); +int xc_monitor_io(xc_interface *xch, uint32_t domain_id, bool enable); /** * This function enables / disables emulation for each REP for a * REP-compatible instruction. diff --git a/tools/libs/ctrl/xc_monitor.c b/tools/libs/ctrl/xc_monitor.c index c5fa62ff30..3cb96f444f 100644 --- a/tools/libs/ctrl/xc_monitor.c +++ b/tools/libs/ctrl/xc_monitor.c @@ -261,6 +261,19 @@ int xc_monitor_vmexit(xc_interface *xch, uint32_t domain_id, bool enable, return do_domctl(xch, &domctl); } +int xc_monitor_io(xc_interface *xch, uint32_t domain_id, bool enable) +{ +DECLARE_DOMCTL; + +domctl.cmd = XEN_DOMCTL_monitor_op; +domctl.domain = domain_id; +domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE +: XEN_DOMCTL_MONITOR_OP_DISABLE; +domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_IO; + +return do_domctl(xch, &domctl); +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c index a11cd76f4d..ff958b6c05 100644 --- a/xen/arch/x86/hvm/monitor.c +++ b/xen/arch/x86/hvm/monitor.c @@ -346,6 +346,27 @@ int hvm_monitor_vmexit(unsigned long exit_reason, return monitor_traps(curr, ad->monitor.vmexit_sync, &req); } +int hvm_monitor_io(uint16_t port, unsigned int bytes, + int dir, bool string_ins) +{ +struct vcpu *curr = current; +struct arch_domain *ad = &curr->domain->arch; +vm_event_request_t req = {}; + +if ( !ad->monitor.io_enabled ) +return 0; + +req.reason = VM_EVENT_REASON_IO_INSTRUCTION; +req.u.io.data_size = bytes; +req.u.io.port = port; +req.u.io.dir = dir; +req.u.io.string_ins = string_ins; + +set_npt_base(curr, &req); + +return monitor_traps(curr, true, &req); +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index bfe03316de..b55f825f5d 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2939,6 +2939,14 @@ void svm_vmexit_handler(void) break; case VMEXIT_IOIO: +rc = hvm_monitor_io(vmcb->ei.io.port, +vmcb->ei.io.bytes, +vmcb->ei.io.in ? IOREQ_READ : IOREQ_WRITE, +vmcb->ei.io.str); +if ( rc < 0 ) +goto unexpected_exit_type; +if ( rc ) +break; if ( !vmcb->ei.io.str ) { if ( handle_pio(vmcb->ei.io.port, diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 00b531f76c..8a278f16c5 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -4560,7 +4560,24 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) break; case EXIT_REASON_IO_INSTRUCTION: +{ +uint16_t port; +int bytes, dir; +bool string_ins; +int rc; + __vmread(EXIT_QUALIFICATION, &exit_qualification); + +port = (exit_qualification >> 16) & 0x; +bytes = (exit_qualification & 0x07) + 1; +dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE; +string_ins = (exit_qualification & 0x10); +rc = hvm_monitor_io(port, bytes, dir, string_ins); +if ( rc < 0 ) +goto exit_and_crash; +if ( rc ) +break; +
Re: [PATCH 4/4] Update README to state Python3 requirement
On Fri, Mar 17, 2023 at 09:46:33AM +0100, Jan Beulich wrote: > On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote: > > Python2 is not supported anymore. > > There are two things here which concern me: For one, how come this is > at the end of a series? You want to keep in mind that any series may > be committed piecemeal (unless an indication to the contrary is in > the cover letter, but there's none here in the first place). > > The other aspect is that there's no indication here of it being > consensus that we raise the baseline requirement for Python, and for > Python alone. A decision towards the wider topic of raising baseline > requirements is, as you may recall from the meeting in Cambridge, > still pending. Hmm, in fact the only part of this series that isn't python2 compatible anymore is "install-python-bindings" target in tools/libs/stat/Makefile. And it's enabled only with XENSTAT_PYTHON_BINDING=y is explicitly set. So, maybe this readme change isn't relevant at all, at least not yet. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
[ovmf test] 179713: all pass - PUSHED
flight 179713 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/179713/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 410ca0ff94a42ee541dd6ceab70ea974eeb7e500 baseline version: ovmf 0e5717009779ec6c1e35f979426a2cd407b3e73a Last test of basis 179705 2023-03-17 04:10:46 Z0 days Testing same since 179713 2023-03-17 08:15:15 Z0 days1 attempts People who touched revisions under test: Gang Chen jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 0e57170097..410ca0ff94 410ca0ff94a42ee541dd6ceab70ea974eeb7e500 -> xen-tested-master
Re: [PATCH v2] x86: use POPCNT for hweight() when available
On 17/03/2023 11:22 am, Roger Pau Monné wrote: > On Mon, Jul 15, 2019 at 02:39:04PM +, Jan Beulich wrote: >> This is faster than using the software implementation, and the insn is >> available on all half-way recent hardware. Therefore convert >> generic_hweight() to out-of-line functions (without affecting Arm) >> and use alternatives patching to replace the function calls. >> >> Note that the approach doesn#t work for clang, due to it not recognizing >> -ffixed-*. > I've been giving this a look, and I wonder if it would be fine to > simply push and pop the scratch registers in the 'call' path of the > alternative, as that won't require any specific compiler option. It's been a long while, and in that time I've learnt a lot more about performance, but my root objection to the approach taken here still stands - it is penalising the common case to optimise some pointless corner cases. Yes - on the call path, an extra push/pop pair (or few) to get temp registers is basically free. Right now, the generic_hweight() helpers are static inline in xen/bitops.h and this is nonsense. The absolute best they should be is extern inline in our new lib/ and I'll bet that the compilers stop inlining them there and then. Given new abi's like x86_64-v2 (which guarantees POPCNT as an available feature), it would be nice to arrange to use __builtin_popcnt() to let the compiler optimise to its hearts content, but outside of this case, it is actively damaging to try and optimise for memory operands or to inline the 8/16 case. So, for x86 specifically, we want: if ( CONFIG_POPCNT ) __builtin_popcnt() else ALT( "popcnt D -> a", "call arch_popcnt$N" ) and we can write arch_popcnt* in x86's lib/ and in assembly, because these are trivial enough and we do need to be careful with registers/etc. I'm not sure if a "+D" vs "D" will matter at the top level. Probably not, so it might be an easy way to get one tempt register. Other temp registers can come from push/pop. While we're at it, we should split hweight out of bitops and write the common header in such a way that it defaults to the generic implementations in lib/, and that will subsume the ARM header and also make this work on RISC-V for free. ~Andrew
Re: [PATCH 4/4] Update README to state Python3 requirement
On Fri, Mar 17, 2023 at 8:46 AM Jan Beulich wrote: > On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote: > > Python2 is not supported anymore. > > There are two things here which concern me: For one, how come this is > at the end of a series? You want to keep in mind that any series may > be committed piecemeal (unless an indication to the contrary is in > the cover letter, but there's none here in the first place). > > The other aspect is that there's no indication here of it being > consensus that we raise the baseline requirement for Python, and for > Python alone. A decision towards the wider topic of raising baseline > requirements is, as you may recall from the meeting in Cambridge, > still pending. > To me, the idea behind that discussion was that if we agree on a policy -- or at least general principles -- then we can avoid having to have discussions on a case-by-case basis. The fact that the discussion is still open isn't a reason not to deprecate features; it just means that we still need to discuss each one on its merits. Given that Python 2 was announced unsupported years ago now, it seems obvious to me that we should stop trying to support it. Are you arguing that we *should* continue to support Python2? Do you think anyone will? If so, please either make that case, or recommend that we raise the profile of this so that others can. -George
Re: [PATCH v4] acpi/processor: fix evaluating _PDC method when running as Xen dom0
On 16.03.23 17:42, Roger Pau Monne wrote: In ACPI systems, the OS can direct power management, as opposed to the firmware. This OS-directed Power Management is called OSPM. Part of telling the firmware that the OS going to direct power management is making ACPI "_PDC" (Processor Driver Capabilities) calls. These _PDC methods must be evaluated for every processor object. If these _PDC calls are not completed for every processor it can lead to inconsistency and later failures in things like the CPU frequency driver. In a Xen system, the dom0 kernel is responsible for system-wide power management. The dom0 kernel is in charge of OSPM. However, the number of CPUs available to dom0 can be different than the number of CPUs physically present on the system. This leads to a problem: the dom0 kernel needs to evaluate _PDC for all the processors, but it can't always see them. In dom0 kernels, ignore the existing ACPI method for determining if a processor is physically present because it might not be accurate. Instead, ask the hypervisor for this information. Fix this by introducing a custom function to use when running as Xen dom0 in order to check whether a processor object matches a CPU that's online. Such checking is done using the existing information fetched by the Xen pCPU subsystem, extending it to also store the ACPI ID. This ensures that _PDC method gets evaluated for all physically online CPUs, regardless of the number of CPUs made available to dom0. Fixes: 5d554a7bb064 ('ACPI: processor: add internal processor_physically_present()') Signed-off-by: Roger Pau Monné Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[xen-unstable-smoke test] 179716: tolerable trouble: pass/starved - PUSHED
flight 179716 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/179716/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: xen 9c0061825143716c61622966e76983886ef59361 baseline version: xen 36e49fc8cbad21a4308b4701caaa685ef04e120b Last test of basis 179687 2023-03-16 15:00:53 Z0 days Testing same since 179716 2023-03-17 11:01:55 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD Marek Marczykowski-Górecki jobs: build-arm64-xsm pass build-amd64 pass build-armhf starved build-amd64-libvirt pass test-armhf-armhf-xl starved test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 36e49fc8cb..9c00618251 9c0061825143716c61622966e76983886ef59361 -> smoke
[PATCH v3 02/10] xen/arm: add SVE vector length field to the domain
Add sve_vl_bits field to arch_domain and sve_vl field to xen_arch_domainconfig struct, to allow the domain to have an information about the SVE feature and the number of SVE register bits that are allowed for this domain. sve_vl field is the vector length in bits divided by 128, this allows to use less space in the xen_arch_domainconfig struct. The field is used also to allow or forbid a domain to use SVE, because a value equal to zero means the guest is not allowed to use the feature. Check that the requested vector length is lower or equal to the platform supported vector length, otherwise fail on domain creation. Signed-off-by: Luca Fancellu --- Changes from v2: - rename field in xen_arch_domainconfig from "sve_vl_bits" to "sve_vl" and use the implicit padding after gic_version to store it, now this field is the VL/128. (Jan) - Created domainconfig_decode_vl() function to decode the sve_vl field and use it as plain bits value inside arch_domain. - Changed commit message reflecting the changes Changes from v1: - no changes Changes from RFC: - restore zcr_el2 in sve_restore_state, that will be introduced later in this serie, so remove zcr_el2 related code from this patch and move everything to the later patch (Julien) - add explicit padding into struct xen_arch_domainconfig (Julien) - Don't lower down the vector length, just fail to create the domain. (Julien) --- xen/arch/arm/arm64/sve.c | 9 xen/arch/arm/domain.c| 32 xen/arch/arm/include/asm/arm64/sve.h | 18 xen/arch/arm/include/asm/domain.h| 5 + xen/include/public/arch-arm.h| 2 ++ xen/include/public/domctl.h | 2 +- 6 files changed, 67 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c index c466de61b47f..4a4ff5dbef49 100644 --- a/xen/arch/arm/arm64/sve.c +++ b/xen/arch/arm/arm64/sve.c @@ -6,6 +6,7 @@ */ #include +#include #include #include #include @@ -48,3 +49,11 @@ register_t vl_to_zcr(uint16_t vl) ASSERT(vl > 0); return ((vl / SVE_VL_MULTIPLE_VAL) - 1U) & ZCR_ELx_LEN_MASK; } + +/* Get the system sanitized value for VL in bits */ +uint16_t get_sys_vl_len(void) +{ +/* ZCR_ELx len field is ((len+1) * 128) = vector bits length */ +return ((system_cpuinfo.zcr64.bits[0] & ZCR_ELx_LEN_MASK) + 1U) * +SVE_VL_MULTIPLE_VAL; +} diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index adb6ace2e24d..470e8607fabe 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -13,6 +13,7 @@ #include #include +#include #include #include #include @@ -550,6 +551,8 @@ int arch_vcpu_create(struct vcpu *v) v->arch.vmpidr = MPIDR_SMP | vcpuid_to_vaffinity(v->vcpu_id); v->arch.cptr_el2 = get_default_cptr_flags(); +if ( is_sve_domain(v->domain) ) +v->arch.cptr_el2 &= ~HCPTR_CP(8); v->arch.hcr_el2 = get_default_hcr_flags(); @@ -594,6 +597,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) unsigned int max_vcpus; unsigned int flags_required = (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap); unsigned int flags_optional = (XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_vpmu); +unsigned int sve_vl_bits = domainconfig_decode_vl(config->arch.sve_vl); if ( (config->flags & ~flags_optional) != flags_required ) { @@ -602,6 +606,31 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) return -EINVAL; } +/* Check feature flags */ +if ( sve_vl_bits > 0 ) { +unsigned int zcr_max_bits = get_sys_vl_len(); + +if ( !cpu_has_sve ) +{ +dprintk(XENLOG_INFO, "SVE is unsupported on this machine.\n"); +return -EINVAL; +} +else if ( !is_vl_valid(sve_vl_bits) ) +{ +dprintk(XENLOG_INFO, "Unsupported SVE vector length (%u)\n", +sve_vl_bits); +return -EINVAL; +} +else if ( sve_vl_bits > zcr_max_bits ) +{ +dprintk(XENLOG_INFO, +"The requested SVE vector length (%u) must be lower or \n" +"equal to the platform supported vector length (%u)\n", +sve_vl_bits, zcr_max_bits); +return -EINVAL; +} +} + /* The P2M table must always be shared between the CPU and the IOMMU */ if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept ) { @@ -744,6 +773,9 @@ int arch_domain_create(struct domain *d, if ( (rc = domain_vpci_init(d)) != 0 ) goto fail; +/* Copy and decode sve_vl from the domain configuration */ +d->arch.sve_vl_bits = domainconfig_decode_vl(config->arch.sve_vl); + return 0; fail: diff --git a/xen/arch/arm/include/asm/arm64/sve.h b/xen/arch/arm/include/asm/arm64/sve.h index bd56e2f24230..4b0b941c243b 100644 --- a/xen/arch/arm/include/asm/arm64/sve.h +++ b/x
[PATCH v3 01/10] xen/arm: enable SVE extension for Xen
Enable Xen to handle the SVE extension, add code in cpufeature module to handle ZCR SVE register, disable trapping SVE feature on system boot only when SVE resources are accessed. While there, correct coding style for the comment on coprocessor trapping. Now cptr_el2 is part of the domain context and it will be restored on context switch, this is a preparation for saving the SVE context which will be part of VFP operations, so restore it before the call to save VFP registers. To save an additional isb barrier, restore cptr_el2 before an existing isb barrier and move the call for saving VFP context after that barrier. Change the KConfig entry to make ARM64_SVE symbol selectable, by default it will be not selected. Create sve module and sve_asm.S that contains assembly routines for the SVE feature, this code is inspired from linux and it uses instruction encoding to be compatible with compilers that does not support SVE. Signed-off-by: Luca Fancellu --- Changes from v2: - renamed sve_asm.S in sve-asm.S, new files should not contain underscore in the name (Jan) Changes from v1: - Add assert to vl_to_zcr, it is never called with vl==0, but just to be sure it won't in the future. Changes from RFC: - Moved restoring of cptr before an existing barrier (Julien) - Marked the feature as unsupported for now (Julien) - Trap and un-trap only when using SVE resources in compute_max_zcr() (Julien) --- xen/arch/arm/Kconfig | 10 +++-- xen/arch/arm/arm64/Makefile | 1 + xen/arch/arm/arm64/cpufeature.c | 7 ++-- xen/arch/arm/arm64/sve-asm.S | 48 +++ xen/arch/arm/arm64/sve.c | 50 xen/arch/arm/cpufeature.c| 6 ++- xen/arch/arm/domain.c| 9 +++-- xen/arch/arm/include/asm/arm64/sve.h | 43 xen/arch/arm/include/asm/arm64/sysregs.h | 1 + xen/arch/arm/include/asm/cpufeature.h| 14 +++ xen/arch/arm/include/asm/domain.h| 1 + xen/arch/arm/include/asm/processor.h | 2 + xen/arch/arm/setup.c | 5 ++- xen/arch/arm/traps.c | 28 +++-- 14 files changed, 201 insertions(+), 24 deletions(-) create mode 100644 xen/arch/arm/arm64/sve-asm.S create mode 100644 xen/arch/arm/arm64/sve.c create mode 100644 xen/arch/arm/include/asm/arm64/sve.h diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 239d3aed3c7f..41f45d8d1203 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -112,11 +112,15 @@ config ARM64_PTR_AUTH This feature is not supported in Xen. config ARM64_SVE - def_bool n + bool "Enable Scalar Vector Extension support (UNSUPPORTED)" if UNSUPPORTED depends on ARM_64 help - Scalar Vector Extension support. - This feature is not supported in Xen. + Scalar Vector Extension (SVE/SVE2) support for guests. + + Please be aware that currently, enabling this feature will add latency on + VM context switch between SVE enabled guests, between not-enabled SVE + guests and SVE enabled guests and viceversa, compared to the time + required to switch between not-enabled SVE guests. config ARM64_MTE def_bool n diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile index 6d507da0d44d..24e08fd42596 100644 --- a/xen/arch/arm/arm64/Makefile +++ b/xen/arch/arm/arm64/Makefile @@ -12,6 +12,7 @@ obj-y += insn.o obj-$(CONFIG_LIVEPATCH) += livepatch.o obj-y += smc.o obj-y += smpboot.o +obj-$(CONFIG_ARM64_SVE) += sve.o sve-asm.o obj-y += traps.o obj-y += vfp.o obj-y += vsysreg.o diff --git a/xen/arch/arm/arm64/cpufeature.c b/xen/arch/arm/arm64/cpufeature.c index d9039d37b2d1..b4656ff4d80f 100644 --- a/xen/arch/arm/arm64/cpufeature.c +++ b/xen/arch/arm/arm64/cpufeature.c @@ -455,15 +455,11 @@ static const struct arm64_ftr_bits ftr_id_dfr1[] = { ARM64_FTR_END, }; -#if 0 -/* TODO: use this to sanitize SVE once we support it */ - static const struct arm64_ftr_bits ftr_zcr[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ZCR_ELx_LEN_SHIFT, ZCR_ELx_LEN_SIZE, 0),/* LEN */ ARM64_FTR_END, }; -#endif /* * Common ftr bits for a 32bit register with all hidden, strict @@ -603,6 +599,9 @@ void update_system_features(const struct cpuinfo_arm *new) SANITIZE_ID_REG(zfr64, 0, aa64zfr0); + if ( cpu_has_sve ) + SANITIZE_REG(zcr64, 0, zcr); + /* * Comment from Linux: * Userspace may perform DC ZVA instructions. Mismatched block sizes diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S new file mode 100644 index ..4d1549344733 --- /dev/null +++ b/xen/arch/arm/arm64/sve-asm.S @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Arm SVE assembly routines + * + * Copyright (C) 2022 ARM Ltd.
[PATCH v3 03/10] xen/arm: Expose SVE feature to the guest
When a guest is allowed to use SVE, expose the SVE features through the identification registers. Signed-off-by: Luca Fancellu --- Changes from v2: - no changes Changes from v1: - No changes Changes from RFC: - No changes --- xen/arch/arm/arm64/vsysreg.c | 39 ++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c index 758750983c11..10048bb4d221 100644 --- a/xen/arch/arm/arm64/vsysreg.c +++ b/xen/arch/arm/arm64/vsysreg.c @@ -18,6 +18,7 @@ #include +#include #include #include #include @@ -295,7 +296,28 @@ void do_sysreg(struct cpu_user_regs *regs, GENERATE_TID3_INFO(MVFR0_EL1, mvfr, 0) GENERATE_TID3_INFO(MVFR1_EL1, mvfr, 1) GENERATE_TID3_INFO(MVFR2_EL1, mvfr, 2) -GENERATE_TID3_INFO(ID_AA64PFR0_EL1, pfr64, 0) + +case HSR_SYSREG_ID_AA64PFR0_EL1: +{ +register_t guest_reg_value = guest_cpuinfo.pfr64.bits[0]; + +if ( is_sve_domain(v->domain) ) +{ +/* 4 is the SVE field width in id_aa64pfr0_el1 */ +uint64_t mask = GENMASK(ID_AA64PFR0_SVE_SHIFT + 4 - 1, +ID_AA64PFR0_SVE_SHIFT); +/* sysval is the sve field on the system */ +uint64_t sysval = cpuid_feature_extract_unsigned_field_width( +system_cpuinfo.pfr64.bits[0], +ID_AA64PFR0_SVE_SHIFT, 4); +guest_reg_value &= ~mask; +guest_reg_value |= (sysval << ID_AA64PFR0_SVE_SHIFT) & mask; +} + +return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 1, + guest_reg_value); +} + GENERATE_TID3_INFO(ID_AA64PFR1_EL1, pfr64, 1) GENERATE_TID3_INFO(ID_AA64DFR0_EL1, dbg64, 0) GENERATE_TID3_INFO(ID_AA64DFR1_EL1, dbg64, 1) @@ -306,7 +328,20 @@ void do_sysreg(struct cpu_user_regs *regs, GENERATE_TID3_INFO(ID_AA64MMFR2_EL1, mm64, 2) GENERATE_TID3_INFO(ID_AA64AFR0_EL1, aux64, 0) GENERATE_TID3_INFO(ID_AA64AFR1_EL1, aux64, 1) -GENERATE_TID3_INFO(ID_AA64ZFR0_EL1, zfr64, 0) + +case HSR_SYSREG_ID_AA64ZFR0_EL1: +{ +/* + * When the guest has the SVE feature enabled, the whole id_aa64zfr0_el1 + * needs to be exposed. + */ +register_t guest_reg_value = guest_cpuinfo.zfr64.bits[0]; +if ( is_sve_domain(v->domain) ) +guest_reg_value = system_cpuinfo.zfr64.bits[0]; + +return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 1, + guest_reg_value); +} /* * Those cases are catching all Reserved registers trapped by TID3 which -- 2.34.1
[PATCH v3 07/10] xen/physinfo: encode Arm SVE vector length in arch_capabilities
When the arm platform supports SVE, advertise the feature in the field arch_capabilities in struct xen_sysctl_physinfo by encoding the SVE vector length in it. Signed-off-by: Luca Fancellu --- Changes from v2: - Remove XEN_SYSCTL_PHYSCAP_ARM_SVE_SHFT, use MASK_INSR and protect with ifdef XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK (Jan) - Use the helper function sve_arch_cap_physinfo to encode the VL into physinfo arch_capabilities field. Changes from v1: - Use only arch_capabilities and some defines to encode SVE VL (Bertrand, Stefano, Jan) Changes from RFC: - new patch --- xen/arch/arm/arm64/sve.c | 12 xen/arch/arm/include/asm/arm64/sve.h | 2 ++ xen/arch/arm/sysctl.c| 3 +++ xen/include/public/sysctl.h | 4 4 files changed, 21 insertions(+) diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c index 6593b59b58e8..409b029e4d21 100644 --- a/xen/arch/arm/arm64/sve.c +++ b/xen/arch/arm/arm64/sve.c @@ -117,3 +117,15 @@ void sve_restore_state(struct vcpu *v) sve_load_ctx(sve_ctx_zreg_end, v->arch.vfp.fpregs, 1); } + +void sve_arch_cap_physinfo(uint32_t *arch_capabilities) +{ +if ( cpu_has_sve ) +{ +/* Vector length is divided by 128 to save some space */ +uint32_t sve_vl = MASK_INSR(domainconfig_encode_vl(get_sys_vl_len()), +XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK); + +*arch_capabilities |= sve_vl; +} +} diff --git a/xen/arch/arm/include/asm/arm64/sve.h b/xen/arch/arm/include/asm/arm64/sve.h index 8654b0fac4bc..34b4c39191cb 100644 --- a/xen/arch/arm/include/asm/arm64/sve.h +++ b/xen/arch/arm/include/asm/arm64/sve.h @@ -41,6 +41,7 @@ int sve_context_init(struct vcpu *v); void sve_context_free(struct vcpu *v); void sve_save_state(struct vcpu *v); void sve_restore_state(struct vcpu *v); +void sve_arch_cap_physinfo(uint32_t *arch_capabilities); #else /* !CONFIG_ARM64_SVE */ @@ -69,6 +70,7 @@ static inline int sve_context_init(struct vcpu *v) static inline void sve_context_free(struct vcpu *v) {} static inline void sve_save_state(struct vcpu *v) {} static inline void sve_restore_state(struct vcpu *v) {} +static inline void sve_arch_cap_physinfo(uint32_t *arch_capabilities) {} #endif diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c index b0a78a8b10d0..64e4d3e06a6b 100644 --- a/xen/arch/arm/sysctl.c +++ b/xen/arch/arm/sysctl.c @@ -11,11 +11,14 @@ #include #include #include +#include #include void arch_do_physinfo(struct xen_sysctl_physinfo *pi) { pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap; + +sve_arch_cap_physinfo(&pi->arch_capabilities); } long arch_do_sysctl(struct xen_sysctl *sysctl, diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index e8dded9fb94a..99ea3fa0740b 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -94,6 +94,10 @@ struct xen_sysctl_tbuf_op { /* Max XEN_SYSCTL_PHYSCAP_* constant. Used for ABI checking. */ #define XEN_SYSCTL_PHYSCAP_MAX XEN_SYSCTL_PHYSCAP_gnttab_v2 +#ifdef __aarch64__ +#define XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK (0x1FU) +#endif + struct xen_sysctl_physinfo { uint32_t threads_per_core; uint32_t cores_per_socket; -- 2.34.1
[PATCH v3 04/10] xen/arm: add SVE exception class handling
SVE has a new exception class with code 0x19, introduce the new code and handle the exception. Signed-off-by: Luca Fancellu --- Changes from v2: - No changes Changes from v1: - No changes Changes from RFC: - No changes --- xen/arch/arm/include/asm/processor.h | 1 + xen/arch/arm/traps.c | 12 2 files changed, 13 insertions(+) diff --git a/xen/arch/arm/include/asm/processor.h b/xen/arch/arm/include/asm/processor.h index 0e38926b94db..625c2bd0cd6c 100644 --- a/xen/arch/arm/include/asm/processor.h +++ b/xen/arch/arm/include/asm/processor.h @@ -426,6 +426,7 @@ #define HSR_EC_HVC640x16 #define HSR_EC_SMC640x17 #define HSR_EC_SYSREG 0x18 +#define HSR_EC_SVE 0x19 #endif #define HSR_EC_INSTR_ABORT_LOWER_EL 0x20 #define HSR_EC_INSTR_ABORT_CURR_EL 0x21 diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index a78a99ddadd0..c2e30feafd5a 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2160,6 +2160,13 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) perfc_incr(trap_sysreg); do_sysreg(regs, hsr); break; +case HSR_EC_SVE: +GUEST_BUG_ON(regs_mode_is_32bit(regs)); +gprintk(XENLOG_WARNING, +"Domain id %d tried to use SVE while not allowed\n", +current->domain->domain_id); +inject_undef_exception(regs, hsr); +break; #endif case HSR_EC_INSTR_ABORT_LOWER_EL: @@ -2189,6 +2196,11 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) case HSR_EC_BRK: do_trap_brk(regs, hsr); break; +case HSR_EC_SVE: +/* An SVE exception is a bug somewhere in hypervisor code */ +printk("SVE trap at EL2.\n"); +do_unexpected_trap("Hypervisor", regs); +break; #endif case HSR_EC_DATA_ABORT_CURR_EL: case HSR_EC_INSTR_ABORT_CURR_EL: -- 2.34.1
[PATCH v3 08/10] tools: add physinfo arch_capabilities handling for Arm
On Arm, the SVE vector length is encoded in arch_capabilities field of struct xen_sysctl_physinfo, make use of this field in the tools when building for arm. Signed-off-by: Luca Fancellu --- Changes from v2: - rename arm_arch_capabilities.h in arm-arch-capabilities.h, use MASK_EXTR. - Now arm-arch-capabilities.h needs MASK_EXTR macro, but it is defined in libxl_internal.h, it doesn't feel right to include that header so move MASK_EXTR into xen-tools/libs.h that is also included in libxl_internal.h Changes from v1: - now SVE VL is encoded in arch_capabilities on Arm Changes from RFC: - new patch --- tools/golang/xenlight/helpers.gen.go | 2 ++ tools/golang/xenlight/types.gen.go| 1 + tools/include/arm-arch-capabilities.h | 33 +++ tools/include/xen-tools/libs.h| 2 ++ tools/libs/light/libxl.c | 1 + tools/libs/light/libxl_internal.h | 1 - tools/libs/light/libxl_types.idl | 1 + tools/ocaml/libs/xc/xenctrl.ml| 4 +--- tools/ocaml/libs/xc/xenctrl.mli | 4 +--- tools/ocaml/libs/xc/xenctrl_stubs.c | 8 --- tools/python/xen/lowlevel/xc/xc.c | 8 +-- tools/xl/xl_info.c| 8 +++ 12 files changed, 61 insertions(+), 12 deletions(-) create mode 100644 tools/include/arm-arch-capabilities.h diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index 0a203d22321f..35397be2f9e2 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -3506,6 +3506,7 @@ x.CapVmtrace = bool(xc.cap_vmtrace) x.CapVpmu = bool(xc.cap_vpmu) x.CapGnttabV1 = bool(xc.cap_gnttab_v1) x.CapGnttabV2 = bool(xc.cap_gnttab_v2) +x.ArchCapabilities = uint32(xc.arch_capabilities) return nil} @@ -3540,6 +3541,7 @@ xc.cap_vmtrace = C.bool(x.CapVmtrace) xc.cap_vpmu = C.bool(x.CapVpmu) xc.cap_gnttab_v1 = C.bool(x.CapGnttabV1) xc.cap_gnttab_v2 = C.bool(x.CapGnttabV2) +xc.arch_capabilities = C.uint32_t(x.ArchCapabilities) return nil } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index a7c17699f80e..3d968a496744 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -1079,6 +1079,7 @@ CapVmtrace bool CapVpmu bool CapGnttabV1 bool CapGnttabV2 bool +ArchCapabilities uint32 } type Connectorinfo struct { diff --git a/tools/include/arm-arch-capabilities.h b/tools/include/arm-arch-capabilities.h new file mode 100644 index ..e07384652b14 --- /dev/null +++ b/tools/include/arm-arch-capabilities.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 ARM Ltd. + */ + +#ifndef ARM_ARCH_CAPABILITIES_H +#define ARM_ARCH_CAPABILITIES_H + +/* Tell the Xen public headers we are a user-space tools build. */ +#ifndef __XEN_TOOLS__ +#define __XEN_TOOLS__ 1 +#endif + +#include +#include + +#include + +static inline +unsigned int arch_capabilities_arm_sve(unsigned int arch_capabilities) +{ +#if defined(__aarch64__) +unsigned int sve_vl = MASK_EXTR(arch_capabilities, +XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK); + +/* Vector length is divided by 128 before storing it in arch_capabilities */ +return sve_vl * 128U; +#else +return 0; +#endif +} + +#endif /* ARM_ARCH_CAPABILITIES_H */ diff --git a/tools/include/xen-tools/libs.h b/tools/include/xen-tools/libs.h index bafc90e2f603..8850dcef4b7f 100644 --- a/tools/include/xen-tools/libs.h +++ b/tools/include/xen-tools/libs.h @@ -63,6 +63,8 @@ #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1)) #endif +#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m))) + #ifndef __must_check #define __must_check __attribute__((__warn_unused_result__)) #endif diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c index a0bf7d186f69..175d6dde0b80 100644 --- a/tools/libs/light/libxl.c +++ b/tools/libs/light/libxl.c @@ -409,6 +409,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo) !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_gnttab_v1); physinfo->cap_gnttab_v2 = !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_gnttab_v2); +physinfo->arch_capabilities = xcphysinfo.arch_capabilities; GC_FREE; return 0; diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h index ad982d691ab9..e8c1b2c2d3e7 100644 --- a/tools/libs/light/libxl_internal.h +++ b/tools/libs/light/libxl_internal.h @@ -132,7 +132,6 @@ #define DIV_ROUNDUP(n, d) (((n) + (d) - 1) / (d)) -#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m))) #define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m)) #define LIBXL__LOGGING_ENABLED diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index c10292e0d7e3..fd31dacf7d5a 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -1133,6 +1133,7 @@ libxl_physinfo = Struct("phy
[PATCH v3 00/10] SVE feature for arm guests
This serie is introducing the possibility for Dom0 and DomU guests to use sve/sve2 instructions. SVE feature introduces new instruction and registers to improve performances on floating point operations. The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and when available the ID_AA64ZFR0_EL1 register provides additional information about the implemented version and other SVE feature. New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx. Z0-Z31 are scalable vector register whose size is implementation defined and goes from 128 bits to maximum 2048, the term vector length will be used to refer to this quantity. P0-P15 are predicate registers and the size is the vector length divided by 8, same size is the FFR (First Fault Register). ZCR_ELx is a register that can control and restrict the maximum vector length used by the exception level and all the lower exception levels, so for example EL3 can restrict the vector length usable by EL3,2,1,0. The platform has a maximum implemented vector length, so for every value written in ZCR register, if this value is above the implemented length, then the lower value will be used. The RDVL instruction can be used to check what vector length is the HW using after setting ZCR. For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no need to save them separately, saving Z0-Z31 will save implicitly also V0-V31. SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the register is added to the domain state, to be able to trap only the guests that are not allowed to use SVE. This serie is introducing a command line parameter to enable Dom0 to use SVE and to set its maximum vector length that by default is 0 which means the guest is not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with the selected value used as maximum allowed vector length (which could be lower if the implemented one is lower). For DomUs, an XL parameter with the same way of use is introduced and a dom0less DTB binding is created. The context switch is the most critical part because there can be big registers to be saved, in this serie an easy approach is used and the context is saved/restored every time for the guests that are allowed to use SVE. Luca Fancellu (10): xen/arm: enable SVE extension for Xen xen/arm: add SVE vector length field to the domain xen/arm: Expose SVE feature to the guest xen/arm: add SVE exception class handling arm/sve: save/restore SVE context switch xen/arm: enable Dom0 to use SVE feature xen/physinfo: encode Arm SVE vector length in arch_capabilities tools: add physinfo arch_capabilities handling for Arm xen/tools: add sve parameter in XL configuration xen/arm: add sve property for dom0less domUs docs/man/xl.cfg.5.pod.in | 11 ++ docs/misc/arm/device-tree/booting.txt| 9 ++ docs/misc/xen-command-line.pandoc| 13 ++ tools/golang/xenlight/helpers.gen.go | 4 + tools/golang/xenlight/types.gen.go | 2 + tools/include/arm-arch-capabilities.h| 33 tools/include/libxl.h| 5 + tools/include/xen-tools/libs.h | 2 + tools/libs/light/libxl.c | 1 + tools/libs/light/libxl_arm.c | 2 + tools/libs/light/libxl_internal.h| 1 - tools/libs/light/libxl_types.idl | 2 + tools/ocaml/libs/xc/xenctrl.ml | 4 +- tools/ocaml/libs/xc/xenctrl.mli | 4 +- tools/ocaml/libs/xc/xenctrl_stubs.c | 8 +- tools/python/xen/lowlevel/xc/xc.c| 8 +- tools/xl/xl_info.c | 8 + tools/xl/xl_parse.c | 26 +++- xen/arch/arm/Kconfig | 10 +- xen/arch/arm/arm64/Makefile | 1 + xen/arch/arm/arm64/cpufeature.c | 7 +- xen/arch/arm/arm64/sve-asm.S | 189 +++ xen/arch/arm/arm64/sve.c | 131 xen/arch/arm/arm64/vfp.c | 79 ++ xen/arch/arm/arm64/vsysreg.c | 39 - xen/arch/arm/cpufeature.c| 6 +- xen/arch/arm/domain.c| 48 +- xen/arch/arm/domain_build.c | 11 ++ xen/arch/arm/include/asm/arm64/sve.h | 85 ++ xen/arch/arm/include/asm/arm64/sysregs.h | 4 + xen/arch/arm/include/asm/arm64/vfp.h | 10 ++ xen/arch/arm/include/asm/cpufeature.h| 14 ++ xen/arch/arm/include/asm/domain.h| 8 + xen/arch/arm/include/asm/processor.h | 3 + xen/arch/arm/setup.c | 5 +- xen/arch/arm/sysctl.c| 3 + xen/arch/arm/traps.c | 40 +++-- xen/include/public/arch-arm.h| 2 + xen/include/public/domctl.h | 2 +- xen/include/public/sysctl.h | 4 + 40 files changed, 769 insertions(+), 75 deletions(-) create mode 100644 tools/include/arm-arch-ca
[PATCH v3 06/10] xen/arm: enable Dom0 to use SVE feature
Add a command line parameter to allow Dom0 the use of SVE resources, the command line parameter dom0_sve controls the feature on this domain and sets the maximum SVE vector length for Dom0. Signed-off-by: Luca Fancellu --- Changes from v2: - xen_domctl_createdomain field has changed into sve_vl and its value now is the VL / 128, create an helper function for that. Changes from v1: - No changes Changes from RFC: - Changed docs to explain that the domain won't be created if the requested vector length is above the supported one from the platform. --- docs/misc/xen-command-line.pandoc| 13 + xen/arch/arm/arm64/sve.c | 5 + xen/arch/arm/domain_build.c | 4 xen/arch/arm/include/asm/arm64/sve.h | 9 + 4 files changed, 31 insertions(+) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index e0b89b7d3319..595e0d17183c 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1005,6 +1005,19 @@ restrictions set up here. Note that the values to be specified here are ACPI PXM ones, not Xen internal node numbers. `relaxed` sets up vCPU affinities to prefer but be not limited to the specified node(s). +### dom0_sve (arm) +> `= ` + +> Default: `0` + +Enable arm SVE usage for Dom0 domain and sets the maximum SVE vector length. +Values above 0 means feature is enabled for Dom0, otherwise feature is disabled. +Possible values are from 0 to maximum 2048, being multiple of 128, that will be +the maximum vector length. +Please note that the platform can supports a lower value, if the requested value +is above the supported one, the domain creation will fail and the system will +stop. + ### dom0_vcpus_pin > `= ` diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c index 1b95a3cbadd1..6593b59b58e8 100644 --- a/xen/arch/arm/arm64/sve.c +++ b/xen/arch/arm/arm64/sve.c @@ -5,10 +5,15 @@ * Copyright (C) 2022 ARM Ltd. */ +#include #include #include #include +/* opt_dom0_sve: allow Dom0 to use SVE and set maximum vector length. */ +unsigned int __initdata opt_dom0_sve; +integer_param("dom0_sve", opt_dom0_sve); + extern unsigned int sve_get_hw_vl(void); extern void sve_save_ctx(uint64_t *sve_ctx, uint64_t *pregs, int save_ffr); extern void sve_load_ctx(uint64_t const *sve_ctx, uint64_t const *pregs, diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 9707eb7b1bb1..aa5ff52b90c2 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -4084,6 +4085,9 @@ void __init create_dom0(void) if ( iommu_enabled ) dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu; +if ( opt_dom0_sve > 0 ) +dom0_cfg.arch.sve_vl = domainconfig_encode_vl(opt_dom0_sve); + dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap); if ( IS_ERR(dom0) ) panic("Error creating domain 0 (rc = %ld)\n", PTR_ERR(dom0)); diff --git a/xen/arch/arm/include/asm/arm64/sve.h b/xen/arch/arm/include/asm/arm64/sve.h index ced010f42dad..8654b0fac4bc 100644 --- a/xen/arch/arm/include/asm/arm64/sve.h +++ b/xen/arch/arm/include/asm/arm64/sve.h @@ -25,8 +25,15 @@ static inline uint16_t domainconfig_decode_vl(uint8_t sve_vl) return sve_vl * SVE_VL_MULTIPLE_VAL; } +static inline uint8_t domainconfig_encode_vl(uint16_t sve_vl_bits) +{ +return sve_vl_bits / SVE_VL_MULTIPLE_VAL; +} + #ifdef CONFIG_ARM64_SVE +extern unsigned int opt_dom0_sve; + register_t compute_max_zcr(void); register_t vl_to_zcr(uint16_t vl); uint16_t get_sys_vl_len(void); @@ -37,6 +44,8 @@ void sve_restore_state(struct vcpu *v); #else /* !CONFIG_ARM64_SVE */ +#define opt_dom0_sve (0) + static inline register_t compute_max_zcr(void) { return 0; -- 2.34.1
[PATCH v3 10/10] xen/arm: add sve property for dom0less domUs
Add a device tree property in the dom0less domU configuration to enable the guest to use SVE. Update documentation. Signed-off-by: Luca Fancellu --- Changes from v2: - xen_domctl_createdomain field name has changed into sve_vl and its value is the VL/128, use domainconfig_encode_vl to encode a plain VL in bits. Changes from v1: - No changes Changes from RFC: - Changed documentation --- docs/misc/arm/device-tree/booting.txt | 9 + xen/arch/arm/domain_build.c | 7 +++ 2 files changed, 16 insertions(+) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index 3879340b5e0a..d74bf9ab1c8b 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -193,6 +193,15 @@ with the following properties: Optional. Handle to a xen,cpupool device tree node that identifies the cpupool where the guest will be started at boot. +- sve + +Optional. A number that, when above 0, enables SVE for this guest and sets +its maximum SVE vector length. The default value is 0, that means this +guest is not allowed to use SVE, the maximum value allowed is 2048, any +other value must be multiple of 128. +Please note that if the platform supports a lower value of bits, then the +domain creation will fail. + - xen,enhanced A string property. Possible property values are: diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index aa5ff52b90c2..81af231d1d87 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -3964,6 +3964,13 @@ void __init create_domUs(void) d_cfg.max_maptrack_frames = val; } +if ( dt_property_read_u32(node, "sve", &val) ) +{ +if ( val > UINT16_MAX ) +panic("sve property value (%"PRIu32") overflow\n", val); +d_cfg.arch.sve_vl = domainconfig_encode_vl(val); +} + /* * The variable max_init_domid is initialized with zero, so here it's * very important to use the pre-increment operator to call -- 2.34.1
[PATCH v3 09/10] xen/tools: add sve parameter in XL configuration
Add sve parameter in XL configuration to allow guests to use SVE feature. Signed-off-by: Luca Fancellu Acked-by: George Dunlap --- Changes from v2: - domain configuration field name has changed to sve_vl, also its value now is VL/128. - Add Ack-by George for the Golang bits Changes from v1: - updated to use arch_capabilities field for vector length Changes from RFC: - changed libxl_types.idl sve field to uint16 - now toolstack uses info from physinfo to check against the sve XL value - Changed documentation --- docs/man/xl.cfg.5.pod.in | 11 +++ tools/golang/xenlight/helpers.gen.go | 2 ++ tools/golang/xenlight/types.gen.go | 1 + tools/include/libxl.h| 5 + tools/libs/light/libxl_arm.c | 2 ++ tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 26 -- 7 files changed, 46 insertions(+), 2 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 10f37990be57..adf48fe8ac1d 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -2952,6 +2952,17 @@ Currently, only the "sbsa_uart" model is supported for ARM. =back +=item B + +To enable SVE, user must specify a number different from zero, maximum 2048 and +multiple of 128. That value will be the maximum number of SVE registers bits +that the hypervisor will impose to this guest. If the platform has a lower +supported bits value, then the domain creation will fail. +A value equal to zero is the default and it means this guest is not allowed to +use SVE. + +=back + =head3 x86 =over 4 diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index 35397be2f9e2..72a3a12a6065 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -1149,6 +1149,7 @@ default: return fmt.Errorf("invalid union key '%v'", x.Type)} x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version) x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart) +x.ArchArm.Sve = uint16(xc.arch_arm.sve) if err := x.ArchX86.MsrRelaxed.fromC(&xc.arch_x86.msr_relaxed);err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } @@ -1653,6 +1654,7 @@ default: return fmt.Errorf("invalid union key '%v'", x.Type)} xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion) xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart) +xc.arch_arm.sve = C.uint16_t(x.ArchArm.Sve) if err := x.ArchX86.MsrRelaxed.toC(&xc.arch_x86.msr_relaxed); err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index 3d968a496744..3dc292b5f1be 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -564,6 +564,7 @@ TypeUnion DomainBuildInfoTypeUnion ArchArm struct { GicVersion GicVersion Vuart VuartType +Sve uint16 } ArchX86 struct { MsrRelaxed Defbool diff --git a/tools/include/libxl.h b/tools/include/libxl.h index 5c65222f1ecb..c9719d2d785a 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -283,6 +283,11 @@ */ #define LIBXL_HAVE_BUILDINFO_ARCH_ARM_TEE 1 +/* + * libxl_domain_build_info has the arch_arm.sve field. + */ +#define LIBXL_HAVE_BUILDINFO_ARCH_ARM_SVE 1 + /* * LIBXL_HAVE_SOFT_RESET indicates that libxl supports performing * 'soft reset' for domains and there is 'soft_reset' shutdown reason diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index ddc7b2a15975..16a49031fd51 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -211,6 +211,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, return ERROR_FAIL; } +config->arch.sve_vl = d_config->b_info.arch_arm.sve; + return 0; } diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index fd31dacf7d5a..ef4a8358e54e 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -690,6 +690,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), ("vuart", libxl_vuart_type), + ("sve", uint16), ])), ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool), ])), diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index e344d4fda32e..3962ec6ec8cd 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -12,6 +12,7 @@ * GNU Lesser General Public License for more details. */ +#include #include #include #include @@ -1317,8 +1318,6 @@ void parse_config_data(const char *config_source, exit(EXIT_FAILURE); } -libxl_physinfo_dispose(&physinfo); - config= xlu_cfg_init(stderr, config_source); if (!config) { fprintf(stderr, "Failed to allocate for configu
[PATCH v3 05/10] arm/sve: save/restore SVE context switch
Save/restore context switch for SVE, allocate memory to contain the Z0-31 registers whose length is maximum 2048 bits each and FFR who can be maximum 256 bits, the allocated memory depends on how many bits is the vector length for the domain and how many bits are supported by the platform. Save P0-15 whose length is maximum 256 bits each, in this case the memory used is from the fpregs field in struct vfp_state, because V0-31 are part of Z0-31 and this space would have been unused for SVE domain otherwise. Create zcr_el{1,2} fields in arch_vcpu, initialise zcr_el2 on vcpu creation given the requested vector length and restore it on context switch, save/restore ZCR_EL1 value as well. Remove headers from sve.c that are already included using xen/sched.h. Signed-off-by: Luca Fancellu --- Changes from v2: - No changes Changes from v1: - No changes Changes from RFC: - Moved zcr_el2 field introduction in this patch, restore its content inside sve_restore_state function. (Julien) --- xen/arch/arm/arm64/sve-asm.S | 141 +++ xen/arch/arm/arm64/sve.c | 65 ++- xen/arch/arm/arm64/vfp.c | 79 +++-- xen/arch/arm/domain.c| 7 ++ xen/arch/arm/include/asm/arm64/sve.h | 13 +++ xen/arch/arm/include/asm/arm64/sysregs.h | 3 + xen/arch/arm/include/asm/arm64/vfp.h | 10 ++ xen/arch/arm/include/asm/domain.h| 2 + 8 files changed, 281 insertions(+), 39 deletions(-) diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S index 4d1549344733..8c37d7bc95d5 100644 --- a/xen/arch/arm/arm64/sve-asm.S +++ b/xen/arch/arm/arm64/sve-asm.S @@ -17,6 +17,18 @@ .endif .endm +.macro _sve_check_zreg znr +.if (\znr) < 0 || (\znr) > 31 +.error "Bad Scalable Vector Extension vector register number \znr." +.endif +.endm + +.macro _sve_check_preg pnr +.if (\pnr) < 0 || (\pnr) > 15 +.error "Bad Scalable Vector Extension predicate register number \pnr." +.endif +.endm + .macro _check_num n, min, max .if (\n) < (\min) || (\n) > (\max) .error "Number \n out of range [\min,\max]" @@ -26,6 +38,54 @@ /* SVE instruction encodings for non-SVE-capable assemblers */ /* (pre binutils 2.28, all kernel capable clang versions support SVE) */ +/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */ +.macro _sve_str_v nz, nxbase, offset=0 +_sve_check_zreg \nz +_check_general_reg \nxbase +_check_num (\offset), -0x100, 0xff +.inst 0xe5804000\ +| (\nz) \ +| ((\nxbase) << 5) \ +| (((\offset) & 7) << 10) \ +| (((\offset) & 0x1f8) << 13) +.endm + +/* LDR (vector): LDR Z\nz, [X\nxbase, #\offset, MUL VL] */ +.macro _sve_ldr_v nz, nxbase, offset=0 +_sve_check_zreg \nz +_check_general_reg \nxbase +_check_num (\offset), -0x100, 0xff +.inst 0x85804000\ +| (\nz) \ +| ((\nxbase) << 5) \ +| (((\offset) & 7) << 10) \ +| (((\offset) & 0x1f8) << 13) +.endm + +/* STR (predicate): STR P\np, [X\nxbase, #\offset, MUL VL] */ +.macro _sve_str_p np, nxbase, offset=0 +_sve_check_preg \np +_check_general_reg \nxbase +_check_num (\offset), -0x100, 0xff +.inst 0xe580\ +| (\np) \ +| ((\nxbase) << 5) \ +| (((\offset) & 7) << 10) \ +| (((\offset) & 0x1f8) << 13) +.endm + +/* LDR (predicate): LDR P\np, [X\nxbase, #\offset, MUL VL] */ +.macro _sve_ldr_p np, nxbase, offset=0 +_sve_check_preg \np +_check_general_reg \nxbase +_check_num (\offset), -0x100, 0xff +.inst 0x8580\ +| (\np) \ +| ((\nxbase) << 5) \ +| (((\offset) & 7) << 10) \ +| (((\offset) & 0x1f8) << 13) +.endm + /* RDVL X\nx, #\imm */ .macro _sve_rdvl nx, imm _check_general_reg \nx @@ -35,11 +95,92 @@ | (((\imm) & 0x3f) << 5) .endm +/* RDFFR (unpredicated): RDFFR P\np.B */ +.macro _sve_rdffr np +_sve_check_preg \np +.inst 0x2519f000\ +| (\np) +.endm + +/* WRFFR P\np.B */ +.macro _sve_wrffr np +_sve_check_preg \np +.inst 0x25289000\ +| ((\np) << 5) +.endm + +.macro __for from:req, to:req +.if (\from) == (\to) +_for__body %\from +.else +__for %\from, %((\from) + ((\to) - (\from)) / 2) +__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to +.endif +.endm + +.macro _for var:req, from:req, to:req, insn:vararg +.macro _for__body \var:req +.noaltmacro +\insn +.altmacro +.endm + +.altmacro +__for \from, \to +.noaltmacro + +.purgem _for__body +.endm + +.macro sve_save nxzffrctx, nxpctx, save_ffr +_for n, 0, 31, _sve_str_v \n, \nxzffrctx, \n - 32 +_for n, 0, 15, _sve_str_p \n, \nxpctx,
Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring
On 30.11.22 16:09, Roger Pau Monne wrote: The hvc machinery registers both a console and a tty device based on the hv ops provided by the specific implementation. Those two interfaces however have different locks, and there's no single locks that's shared between the tty and the console implementations, hence the driver needs to protect itself against concurrent accesses. Otherwise concurrent calls using the split interfaces are likely to corrupt the ring indexes, leaving the console unusable. Introduce a lock to xencons_info to serialize accesses to the shared ring. This is only required when using the shared memory console, concurrent accesses to the hypercall based console implementation are not an issue. Note the conditional logic in domU_read_console() is slightly modified so the notify_daemon() call can be done outside of the locked region: it's an hypercall and there's no need for it to be done with the lock held. Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console') Signed-off-by: Roger Pau Monné Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist
On 15.02.23 09:31, Jan Beulich wrote: On 15.02.2023 01:07, Boris Ostrovsky wrote: On 2/14/23 6:53 PM, Boris Ostrovsky wrote: On 2/14/23 11:13 AM, Jan Beulich wrote: --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -18,6 +18,8 @@ #include #include +#include + #include #include #include @@ -32,6 +34,7 @@ #include #include #include +#include #include #include "cpu.h" @@ -934,7 +937,8 @@ do_cmd_auto: break; case RETBLEED_MITIGATION_IBPB: - setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB); + if (!xen_pv_domain() || xen_vm_assist_ibpb(true)) Is this going to compile without CONFIG_XEN? Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying the compiler) and DCE will eliminate the call to the function due to xen_pv_domain() being constant "false" in that case, avoiding any linking issues. The interesting case here really is building with XEN but without XEN_PV: That's why I needed to put the function in enlighten.c. This wouldn't be needed if xen_pv_domain() was also constant "false" in that case (just like xen_pvh_domain() is when !XEN_PVH). I also think these two conditions should be wrapped into something to limit exposure of non-Xen code to Xen-specific primitives. I would have done so, if I had any halfway sensible idea on how to go about doing so in this particular case. In the absence of that it looked okay-ish to me to reference Xen functions directly here. Oh, and this needs x86 maintainers. Eventually yes. But I would prefer to sort the above question first (which I'm sure would have been raised by them, in perhaps more harsh a way), hence the initially limited exposure. I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call of arch_smt_update(). This can then correct any needed mitigation settings. So something like (note that due to using cpu_feature_enabled(X86_FEATURE_XENPV) DCE is happening in case CONFIG_XEN_PV isn't defined)": --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params *boot_params); void __init mem_map_via_hcall(struct boot_params *boot_params_p); #endif +int __init xen_vm_assist_ibpb(bool enable); +void __init xen_pv_fix_mitigations(void); + #endif /* _ASM_X86_XEN_HYPERVISOR_H */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -18,6 +18,8 @@ #include #include +#include + #include #include #include @@ -177,6 +179,9 @@ void __init check_bugs(void) srbds_select_mitigation(); l1d_flush_select_mitigation(); + if (cpu_feature_enabled(X86_FEATURE_XENPV)) + xen_pv_fix_mitigations(); + arch_smt_update(); #ifdef CONFIG_X86_32 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void) return 0; } +int __init xen_vm_assist_ibpb(bool enable) +{ + /* +* Note that the VM-assist is a disable, so a request to enable IBPB +* on our behalf needs to turn the functionality off (and vice versa). +*/ + return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable + : VMASST_CMD_enable, + VMASST_TYPE_mode_switch_no_ibpb); +} + +void __init xen_pv_fix_mitigations(void) +{ + if (!xen_vm_assist_ibpb(true)) + setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB); +} + const __initconst struct hypervisor_x86 x86_hyper_xen_pv = { .name = "Xen PV", .detect = xen_platform_pv, --- a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h @@ -413,7 +413,15 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op); */ #define VMASST_TYPE_runstate_update_flag 5 -#define MAX_VMASST_TYPE 5 +/* + * x86-64 guests: Suppress IBPB on guest-user to guest-kernel mode switch. + * + * By default (on affected and capable hardware) as a safety measure Xen, + * to cover for the fact that guest-kernel and guest-user modes are both + * running in ring 3 (and hence share prediction context), would issue a + * barrier for user->kernel mode switches of PV guests. + */ +#define VMASST_TYPE_mode_switch_no_ibpb 33 #ifndef __ASSEMBLY__ Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist
On 17/03/2023 1:56 pm, Juergen Gross wrote: > On 15.02.23 09:31, Jan Beulich wrote: >> On 15.02.2023 01:07, Boris Ostrovsky wrote: >>> >>> On 2/14/23 6:53 PM, Boris Ostrovsky wrote: On 2/14/23 11:13 AM, Jan Beulich wrote: > --- a/arch/x86/kernel/cpu/bugs.c > +++ b/arch/x86/kernel/cpu/bugs.c > @@ -18,6 +18,8 @@ > #include > #include > +#include > + > #include > #include > #include > @@ -32,6 +34,7 @@ > #include > #include > #include > +#include > #include > #include "cpu.h" > @@ -934,7 +937,8 @@ do_cmd_auto: > break; > case RETBLEED_MITIGATION_IBPB: > - setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB); > + if (!xen_pv_domain() || xen_vm_assist_ibpb(true)) Is this going to compile without CONFIG_XEN? >> >> Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying >> the compiler) and DCE will eliminate the call to the function due to >> xen_pv_domain() being constant "false" in that case, avoiding any >> linking issues. The interesting case here really is building with >> XEN but without XEN_PV: That's why I needed to put the function in >> enlighten.c. This wouldn't be needed if xen_pv_domain() was also >> constant "false" in that case (just like xen_pvh_domain() is when >> !XEN_PVH). >> I also think these two conditions should be wrapped into something to limit exposure of non-Xen code to Xen-specific primitives. >> >> I would have done so, if I had any halfway sensible idea on how to >> go about doing so in this particular case. In the absence of that it >> looked okay-ish to me to reference Xen functions directly here. >> >>> Oh, and this needs x86 maintainers. >> >> Eventually yes. But I would prefer to sort the above question first >> (which I'm sure would have been raised by them, in perhaps more >> harsh a way), hence the initially limited exposure. > > I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call > of arch_smt_update(). This can then correct any needed mitigation > settings. > > So something like (note that due to using > cpu_feature_enabled(X86_FEATURE_XENPV) > DCE is happening in case CONFIG_XEN_PV isn't defined)": > > --- a/arch/x86/include/asm/xen/hypervisor.h > +++ b/arch/x86/include/asm/xen/hypervisor.h > @@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params > *boot_params); > void __init mem_map_via_hcall(struct boot_params *boot_params_p); > #endif > > +int __init xen_vm_assist_ibpb(bool enable); > +void __init xen_pv_fix_mitigations(void); I'd suggest 'adjust' in preference to 'fix', but this could equally be xen_pv_mitigations(). XenPV has very legitimate reasons to adjust things owing to it running in Ring3, but "fix" comes with other implications too. > --- a/arch/x86/xen/enlighten_pv.c > +++ b/arch/x86/xen/enlighten_pv.c > @@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void) > return 0; > } > > +int __init xen_vm_assist_ibpb(bool enable) > +{ > + /* > + * Note that the VM-assist is a disable, so a request to > enable IBPB > + * on our behalf needs to turn the functionality off (and vice > versa). > + */ > + return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable > + : VMASST_CMD_enable, > + VMASST_TYPE_mode_switch_no_ibpb); > +} > + > +void __init xen_pv_fix_mitigations(void) > +{ > + if (!xen_vm_assist_ibpb(true)) > + setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB); If nothing else, this needs a comment explaining what's going on. "Xen PV guest kernels run in ring3, so share the same prediction domain as userspace. Xen (since version $X) default to issuing an IBPB on guest user -> guest kernel transitions on behalf of the guest kernel. If Linux isn't depending on mode based prediction separation, turn this behaviour off". But this does open the next question. Yes, unilaterally turning turning this off restores the prior behaviour, but is this really the best thing to do ? ~Andrew
Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist
On 17.03.23 15:21, Andrew Cooper wrote: On 17/03/2023 1:56 pm, Juergen Gross wrote: On 15.02.23 09:31, Jan Beulich wrote: On 15.02.2023 01:07, Boris Ostrovsky wrote: On 2/14/23 6:53 PM, Boris Ostrovsky wrote: On 2/14/23 11:13 AM, Jan Beulich wrote: --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -18,6 +18,8 @@ #include #include +#include + #include #include #include @@ -32,6 +34,7 @@ #include #include #include +#include #include #include "cpu.h" @@ -934,7 +937,8 @@ do_cmd_auto: break; case RETBLEED_MITIGATION_IBPB: - setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB); + if (!xen_pv_domain() || xen_vm_assist_ibpb(true)) Is this going to compile without CONFIG_XEN? Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying the compiler) and DCE will eliminate the call to the function due to xen_pv_domain() being constant "false" in that case, avoiding any linking issues. The interesting case here really is building with XEN but without XEN_PV: That's why I needed to put the function in enlighten.c. This wouldn't be needed if xen_pv_domain() was also constant "false" in that case (just like xen_pvh_domain() is when !XEN_PVH). I also think these two conditions should be wrapped into something to limit exposure of non-Xen code to Xen-specific primitives. I would have done so, if I had any halfway sensible idea on how to go about doing so in this particular case. In the absence of that it looked okay-ish to me to reference Xen functions directly here. Oh, and this needs x86 maintainers. Eventually yes. But I would prefer to sort the above question first (which I'm sure would have been raised by them, in perhaps more harsh a way), hence the initially limited exposure. I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call of arch_smt_update(). This can then correct any needed mitigation settings. So something like (note that due to using cpu_feature_enabled(X86_FEATURE_XENPV) DCE is happening in case CONFIG_XEN_PV isn't defined)": --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params *boot_params); void __init mem_map_via_hcall(struct boot_params *boot_params_p); #endif +int __init xen_vm_assist_ibpb(bool enable); +void __init xen_pv_fix_mitigations(void); I'd suggest 'adjust' in preference to 'fix', but this could equally be xen_pv_mitigations(). XenPV has very legitimate reasons to adjust things owing to it running in Ring3, but "fix" comes with other implications too. --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void) return 0; } +int __init xen_vm_assist_ibpb(bool enable) +{ + /* + * Note that the VM-assist is a disable, so a request to enable IBPB + * on our behalf needs to turn the functionality off (and vice versa). + */ + return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable + : VMASST_CMD_enable, + VMASST_TYPE_mode_switch_no_ibpb); +} + +void __init xen_pv_fix_mitigations(void) +{ + if (!xen_vm_assist_ibpb(true)) + setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB); If nothing else, this needs a comment explaining what's going on. "Xen PV guest kernels run in ring3, so share the same prediction domain as userspace. Xen (since version $X) default to issuing an IBPB on guest user -> guest kernel transitions on behalf of the guest kernel. If Linux isn't depending on mode based prediction separation, turn this behaviour off". But this does open the next question. Yes, unilaterally turning turning this off restores the prior behaviour, but is this really the best thing to do ? I believe this question is primarily for Jan, as he is the initial author of the patch. I was just suggesting a variant which is IMHO more probable to be accepted by the x86 maintainers. :-) Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v4] x86/HVM: support emulated UMIP
On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote: > There are three noteworthy drawbacks: > 1) The intercepts we need to enable here are CPL-independent, i.e. we >now have to emulate certain instructions for ring 0. > 2) On VMX there's no intercept for SMSW, so the emulation isn't really >complete there. Then I'm afraid we can't set the bit in the max CPUID policy. What about domains being migrated from a host that has UMIP to an Intel host where UMIP is emulated? They would see a change in behavior in SMSW, and the behavior won't match the ISA anymore. Thanks, Roger.
Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini wrote: > > On Thu, 16 Mar 2023, Juergen Gross wrote: > > On 16.03.23 14:53, Alex Deucher wrote: > > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross wrote: > > > > > > > > On 16.03.23 14:45, Alex Deucher wrote: > > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich wrote: > > > > > > > > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote: > > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote: > > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote: > > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote: > > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote: > > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of > > > > > > > > > > > hardware > > > > > > > > > > > virtualization support when possible. It will using the > > > > > > > > > > > hardware IOMMU > > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if > > > > > > > > > > > current domain is > > > > > > > > > > > Xen PVH. > > > > > > > > > > > > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how > > > > > > > > > > can > > > > > > > > > > it get > > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O > > > > > > > > > > to an > > > > > > > > > > address-restricted device)? > > > > > > > > > > > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, > > > > > > > > > there > > > > > > > > > is no > > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by > > > > > > > > > the IOMMU > > > > > > > > > so we can use guest physical addresses instead of machine > > > > > > > > > addresses for > > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is > > > > > > > > > available > > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the > > > > > > > > > corresponding > > > > > > > > > case is XENFEAT_not_direct_mapped). > > > > > > > > > > > > > > > > But how does Xen using an IOMMU help with, as said, > > > > > > > > address-restricted > > > > > > > > devices? They may still need e.g. a 32-bit address to be > > > > > > > > programmed in, > > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O > > > > > > > > buffers > > > > > > > > may fulfill this requirement. > > > > > > > > > > > > > > In short, it is going to work as long as Linux has guest physical > > > > > > > addresses (not machine addresses, those could be anything) lower > > > > > > > than > > > > > > > 4GB. > > > > > > > > > > > > > > If the address-restricted device does DMA via an IOMMU, then the > > > > > > > device > > > > > > > gets programmed by Linux using its guest physical addresses (not > > > > > > > machine > > > > > > > addresses). > > > > > > > > > > > > > > The 32-bit restriction would be applied by Linux to its choice of > > > > > > > guest > > > > > > > physical address to use to program the device, the same way it > > > > > > > does > > > > > > > on > > > > > > > native. The device would be fine as it always uses Linux-provided > > > > > > > <4GB > > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), > > > > > > > we > > > > > > > could get any address, including >4GB addresses, and that is > > > > > > > expected to > > > > > > > work. > > > > > > > > > > > > I understand that's the "normal" way of working. But whatever the > > > > > > swiotlb > > > > > > is used for in baremetal Linux, that would similarly require its use > > > > > > in > > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look > > > > > > to > > > > > > me like an incomplete attempt to disable its use altogether on x86. > > > > > > What > > > > > > difference of PVH vs baremetal am I missing here? > > > > > > > > > > swiotlb is not usable for GPUs even on bare metal. They often have > > > > > hundreds or megs or even gigs of memory mapped on the device at any > > > > > given time. Also, AMD GPUs support 44-48 bit DMA masks (depending on > > > > > the chip family). > > > > > > > > But the swiotlb isn't per device, but system global. > > > > > > Sure, but if the swiotlb is in use, then you can't really use the GPU. > > > So you get to pick one. > > > > The swiotlb is used only for buffers which are not within the DMA mask of a > > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA > > mask > > won't use the swiotlb unless you have a buffer above guest physical address > > of > > 16TB (so basically never). > > > > Disabling swiotlb in such a guest would OTOH mean, that a device with only > > 32 bit DMA mask passed through to this guest couldn't work with buffers > > above 4GB. > > > > I don't think this is acceptable. > > From the Xen subsystem in Linux point of view, the only thing we need to > do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not > the global swiotlb) on PVH because it is not needed anyway. > > I think we should leave the global "swiotlb" setti
Re: [PATCH v3 1/6] xen: add reference counter support
On 17.03.2023 11:05, Roger Pau Monné wrote: > On Thu, Mar 16, 2023 at 05:56:00PM +0100, Jan Beulich wrote: >> On 16.03.2023 17:48, Roger Pau Monné wrote: >>> On Thu, Mar 16, 2023 at 05:43:18PM +0100, Jan Beulich wrote: On 16.03.2023 17:39, Roger Pau Monné wrote: > On Thu, Mar 16, 2023 at 05:32:38PM +0100, Jan Beulich wrote: >> On 16.03.2023 17:19, Roger Pau Monné wrote: >>> On Tue, Mar 14, 2023 at 08:56:29PM +, Volodymyr Babchuk wrote: +static inline void refcnt_get(refcnt_t *refcnt) +{ +int old = atomic_add_unless(&refcnt->refcnt, 1, 0); >>> >>> Occurred to me while looking at the next patch: >>> >>> Don't you also need to print a warning (and saturate the counter >>> maybe?) if old == 0, as that would imply the caller is attempting >>> to take a reference of an object that should be destroyed? IOW: it >>> would point to some kind of memory leak. >> >> Hmm, I notice the function presently returns void. I think what to do >> when the counter is zero needs leaving to the caller. See e.g. >> get_page() which will simply indicate failure to the caller in case >> the refcnt is zero. (There overflow handling also is left to the >> caller ... All that matters is whether a ref can be acquired.) > > Hm, likely. I guess pages never go away even when it's refcount > reaches 0. > > For the pdev case attempting to take a refcount on an object that has > 0 refcounts implies that the caller is using leaked memory, as the > point an object reaches 0 it supposed to be destroyed. Hmm, my thinking was that a device would remain at refcnt 0 until it is actually removed, i.e. refcnt == 0 being a prereq for pci_remove_device() to be willing to do anything at all. But maybe that's not a viable model. >>> >>> Right, I think the intention was for pci_remove_device() to drop the >>> refcount to 0 and do the removal, so the refcount should be 1 when >>> calling pci_remove_device(). But none of this is written down, so >>> it's mostly my assumptions from looking at the code. >> >> Could such work at all? The function can't safely drop a reference >> and _then_ check whether it was the last one. The function either has >> to take refcnt == 0 as prereq, or it needs to be the destructor >> function that refcnt_put() calls. > > But then you also get in the trouble of asserting that refcnt == 0 > doesn't change between evaluation and actual removal of the structure. > > Should all refcounts to pdev be taken and dropped while holding the > pcidevs lock? > > I there an email (outside of this series) that contains a description > of how the refcounting is to be used with pdevs? I'm not aware of one. The intentions indeed need outlining somewhere. Jan
Re: [PATCH 4/4] Update README to state Python3 requirement
On 17.03.2023 13:37, George Dunlap wrote: > On Fri, Mar 17, 2023 at 8:46 AM Jan Beulich wrote: > >> On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote: >>> Python2 is not supported anymore. >> >> There are two things here which concern me: For one, how come this is >> at the end of a series? You want to keep in mind that any series may >> be committed piecemeal (unless an indication to the contrary is in >> the cover letter, but there's none here in the first place). >> >> The other aspect is that there's no indication here of it being >> consensus that we raise the baseline requirement for Python, and for >> Python alone. A decision towards the wider topic of raising baseline >> requirements is, as you may recall from the meeting in Cambridge, >> still pending. >> > > To me, the idea behind that discussion was that if we agree on a policy -- > or at least general principles -- then we can avoid having to have > discussions on a case-by-case basis. The fact that the discussion is still > open isn't a reason not to deprecate features; it just means that we still > need to discuss each one on its merits. > > Given that Python 2 was announced unsupported years ago now, it seems > obvious to me that we should stop trying to support it. > > Are you arguing that we *should* continue to support Python2? Do you think > anyone will? I think we really need basics of a policy first. Otherwise what can or cannot be proposed to no longer be supported is just too arbitrary. Here as well as elsewhere my fear is that thing would stop building on rather old distros, where so far things have been building fine. (For Py3 in particular that's no _that_ much of a concern, because quite some time ago qemu already started requiring it. But it would still be a change in how I would need to invoke builds on such old systems that I try Xen out on every once in a while, because I'd then need to override the Python to use not just for qemu, but globally.) Jan
Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains
On 17.03.2023 10:32, Michal Orzel wrote: > > > On 17/03/2023 09:36, Jan Beulich wrote: >> >> >> On 16.03.2023 23:59, Stefano Stabellini wrote: >>> On Thu, 16 Mar 2023, Jan Beulich wrote: On 16.03.2023 11:26, Michal Orzel wrote: > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -490,7 +490,24 @@ static void switch_serial_input(void) > } > else > { > -console_rx++; > +unsigned int next_rx = console_rx + 1; > + > +/* Skip switching serial input to non existing domains */ > +while ( next_rx < max_init_domid + 1 ) > +{ > +struct domain *d = rcu_lock_domain_by_id(next_rx - 1); > + > +if ( d ) > +{ > +rcu_unlock_domain(d); > +break; > +} > + > +next_rx++; > +} > + > +console_rx = next_rx; > + > printk("*** Serial input to DOM%d", console_rx - 1); > } While at the first glance (when you sent it in reply to v1) it looked okay, I'm afraid it really isn't: Please consider what happens when the last of the DomU-s doesn't exist anymore. (You don't really check whether it still exists, because the range check comes ahead of the existence one.) In that case you want to move from second-to-last to Xen. I expect the entire if/else construct wants to be inside the loop. >>> >>> I don't think we need another loop, just a check if we found a domain or >> >> I didn't say "another loop", but I suggested that the loop needs to be >> around the if/else. Of course this can be transformed into equivalent >> forms, like ... >> >>> not. E.g.: >>> >>> >>> unsigned int next_rx = console_rx + 1; >>> >>> /* Skip switching serial input to non existing domains */ >>> while ( next_rx < max_init_domid + 1 ) >>> { >>> struct domain *d = rcu_lock_domain_by_id(next_rx - 1); >>> >>> if ( d ) >>> { >>> rcu_unlock_domain(d); >>> console_rx = next_rx; >>> printk("*** Serial input to DOM%d", console_rx - 1); >>> break; >>> } >>> >>> next_rx++; >>> } >>> >>> /* no domain found */ >>> console_rx = 0; >>> printk("*** Serial input to Xen"); >> >> ... what you suggest (or at least almost, because the way it's written >> we'd always switch to Xen). > > I would prefer a loop with if/else inside. If you are ok with the following > code > that handles all the cases, I will push a patch in a minute: Looks roughly okay, but recall I said so also when you "pre-posted" the previous version. Jan
Re: [PATCH v8 1/5] xen: introduce CONFIG_GENERIC_BUG_FRAME
On 17.03.2023 10:23, Oleksii wrote: > On Thu, 2023-03-16 at 12:26 +0100, Jan Beulich wrote: >> On 15.03.2023 18:21, Oleksii Kurochko wrote: >>> --- /dev/null >>> +++ b/xen/common/bug.c >>> @@ -0,0 +1,108 @@ >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> + >>> +#include >> >> I actually meant to also ask: What is this needed for? Glancing over >> the >> code ... >> >>> +/* >>> + * Returns a negative value in case of an error otherwise >>> + * BUGFRAME_{run_fn, warn, bug, assert} >>> + */ >>> +int do_bug_frame(struct cpu_user_regs *regs, unsigned long pc) >>> +{ >>> + const struct bug_frame *bug = NULL; >>> + const struct virtual_region *region; >>> + const char *prefix = "", *filename, *predicate; >>> + unsigned long fixup; >>> + unsigned int id, lineno; >>> + >>> + region = find_text_region(pc); >>> + if ( !region ) >>> + return -EINVAL; >>> + >>> + for ( id = 0; id < BUGFRAME_NR; id++ ) >>> + { >>> + const struct bug_frame *b; >>> + size_t i; >>> + >>> + for ( i = 0, b = region->frame[id].bugs; >>> + i < region->frame[id].n_bugs; b++, i++ ) >>> + { >>> + if ( bug_loc(b) == pc ) >>> + { >>> + bug = b; >>> + goto found; >>> + } >>> + } >>> + } >>> + >>> + found: >>> + if ( !bug ) >>> + return -ENOENT; >>> + >>> + if ( id == BUGFRAME_run_fn ) >>> + { >>> + void (*fn)(struct cpu_user_regs *) = bug_ptr(bug); >>> + >>> + fn(regs); >>> + >>> + /* Re-enforce consistent types, because of the casts >>> involved. */ >>> + if ( false ) >>> + run_in_exception_handler(fn); >>> + >>> + return id; >>> + } >>> + >>> + /* WARN, BUG or ASSERT: decode the filename pointer and line >>> number. */ >>> + filename = bug_ptr(bug); >>> + if ( !is_kernel(filename) && !is_patch(filename) ) >>> + return -EINVAL; >>> + fixup = strlen(filename); >>> + if ( fixup > 50 ) >>> + { >>> + filename += fixup - 47; >>> + prefix = "..."; >>> + } >>> + lineno = bug_line(bug); >>> + >>> + switch ( id ) >>> + { >>> + case BUGFRAME_warn: >>> + printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno); >>> + show_execution_state(regs); >>> + >>> + break; >>> + >>> + case BUGFRAME_bug: >>> + printk("Xen BUG at %s%s:%d\n", prefix, filename, lineno); >>> + >>> + if ( BUG_DEBUGGER_TRAP_FATAL(regs) ) >>> + break; >>> + >>> + show_execution_state(regs); >>> + panic("Xen BUG at %s%s:%d\n", prefix, filename, lineno); >>> + >>> + case BUGFRAME_assert: >>> + /* ASSERT: decode the predicate string pointer. */ >>> + predicate = bug_msg(bug); >>> + if ( !is_kernel(predicate) && !is_patch(predicate) ) >>> + predicate = ""; >>> + >>> + printk("Assertion '%s' failed at %s%s:%d\n", >>> + predicate, prefix, filename, lineno); >>> + >>> + if ( BUG_DEBUGGER_TRAP_FATAL(regs) ) >>> + break; >>> + >>> + show_execution_state(regs); >>> + panic("Assertion '%s' failed at %s%s:%d\n", >>> + predicate, prefix, filename, lineno); >>> + } >>> + >>> + return id; >>> +} >> >> ... I can't really spot what it might be that comes from that header. >> Oh, on the N+1st run I've spotted it - it's show_execution_state(). >> The declaration of which, already being used from common code ahead >> of this series, should imo be moved to a common header. I guess I'll >> make yet another patch ... > As mentioned above. Not only show_execution_state() but also > cpu_user_regs structure. ( at lest, for ARM & RISCV ) Do we deref "regs" anywhere? I can't seem to be able to spot an instance. Without a deref (or alike) a forward decl is all that's needed for this code to compile. Jan
Re: [PATCH v4] x86/HVM: support emulated UMIP
On 17.03.2023 15:29, Roger Pau Monné wrote: > On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote: >> There are three noteworthy drawbacks: >> 1) The intercepts we need to enable here are CPL-independent, i.e. we >>now have to emulate certain instructions for ring 0. >> 2) On VMX there's no intercept for SMSW, so the emulation isn't really >>complete there. > > Then I'm afraid we can't set the bit in the max CPUID policy. What > about domains being migrated from a host that has UMIP to an Intel > host where UMIP is emulated? They would see a change in behavior in > SMSW, and the behavior won't match the ISA anymore. Right, but that's the price to pay if we want such emulation (which back at the time did look at least desirable, because the other affected insns are more important to deal with). Not setting the bit in the max policy is as good as not having emulation on VMX at all then. Jan
[xen-unstable-smoke test] 179720: tolerable trouble: pass/starved - PUSHED
flight 179720 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/179720/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: xen 9bf21fcaef07f68ab52d0382ff554616a1cf66d8 baseline version: xen 9c0061825143716c61622966e76983886ef59361 Last test of basis 179716 2023-03-17 11:01:55 Z0 days Testing same since 179720 2023-03-17 13:01:58 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Michal Orzel jobs: build-arm64-xsm pass build-amd64 pass build-armhf starved build-amd64-libvirt pass test-armhf-armhf-xl starved test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 9c00618251..9bf21fcaef 9bf21fcaef07f68ab52d0382ff554616a1cf66d8 -> smoke
Re: [PATCH v4] x86/HVM: support emulated UMIP
On Fri, Mar 17, 2023 at 04:01:59PM +0100, Jan Beulich wrote: > On 17.03.2023 15:29, Roger Pau Monné wrote: > > On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote: > >> There are three noteworthy drawbacks: > >> 1) The intercepts we need to enable here are CPL-independent, i.e. we > >>now have to emulate certain instructions for ring 0. > >> 2) On VMX there's no intercept for SMSW, so the emulation isn't really > >>complete there. > > > > Then I'm afraid we can't set the bit in the max CPUID policy. What > > about domains being migrated from a host that has UMIP to an Intel > > host where UMIP is emulated? They would see a change in behavior in > > SMSW, and the behavior won't match the ISA anymore. > > Right, but that's the price to pay if we want such emulation (which back > at the time did look at least desirable, because the other affected insns > are more important to deal with). Not setting the bit in the max policy > is as good as not having emulation on VMX at all then. It would need some kind of justification at least on why it's deemed worth exposing in the max policy (and thus made available to incoming guests) even when not compliant to the specification. Could the non-intercaption of CR0 reads and thus no #GP on SMSW on Intel lead to software malfunctioning as a result? Thanks, Roger.
Re: [PATCH v3 2/2] x86: detect CMOS aliasing on ports other than 0x70/0x71
On Mon, Jul 20, 2020 at 01:11:18PM +0200, Roger Pau Monné wrote: > On Wed, Jul 15, 2020 at 01:57:07PM +0200, Jan Beulich wrote: > > ... in order to also intercept accesses through the alias ports. > > > > Also stop intercepting accesses to the CMOS ports if we won't ourselves > > use the CMOS RTC. > > Will wait for v4 with the fixed additions to PVH dom0. I think this is waiting for a v4, let me know if that's not the case. Thanks, Roger.
[linux-5.4 test] 179711: regressions - trouble: fail/pass/starved
flight 179711 linux-5.4 real [real] flight 179722 linux-5.4 real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/179711/ http://logs.test-lab.xenproject.org/osstest/logs/179722/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail REGR. vs. 179598 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179598 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179598 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179598 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179598 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179598 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179598 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179598 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179598 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179598 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-examine 1 build-check(1) starved n/a test-arm64-arm64-libvirt-raw 1 build-check(1) starved n/a build-armhf-libvirt 1 build-check(1) starved n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) starved n/a test-arm64-arm64-xl 1 build-check(1) starved n/a test-arm64-arm64-xl-credit1 1 build-check(1) starved n/a test-arm64-arm64-xl-credit2 1 build-check(1) starved n/a test-arm64-arm64-xl-thunderx 1 build-check(1) starved n/a test-arm64-arm64-xl-vhd 1 build-check(1) starved n/a test-arm64-arm64-xl-xsm 1 build-check(1) starved n/a test-armhf-armhf-examine 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-xl 1 build-check(1) starved n/a test-armhf-armhf-xl-credit1 1 build-check(1) starved n/a test-armhf-armhf-xl-credit2 1 build-check(1) starved n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) starved n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) starved n/a test-armhf-armhf-xl-rtds 1 build-check(1) starved n/a test-armhf-armhf-xl-vhd 1 build-check(1) starved n/a build-arm64-pvops 2 hosts-allocate starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: linuxe4b5c766f50525d84754cfdf0e4edef1db9622c0 baseline version: linuxb829e8b6e1a7bb8cd3042b927113a22fed0b0d47 Last test of basis 179598 2023-03-13 09:42:37 Z4 days Testing same since 179711 2023-03-17 07:42:06 Z0 days1 attempts People who touched revisions under test: Al Viro Alex Deucher Alexandre Ghiti Alvaro Karsz Amir Goldstein Andrew Cooper Andrew Morton Ashok Raj Bart Van Assche Bixuan Cui Bjorn Helgaas Borislav Petkov (AMD) Borislav Petkov Chandrakanth Patil Chris Paterson (CIP) Corey Minyard D. Wythe Darrick J. Wong David S. Miller Dmitry Baryshkov Edward Humes Eric Dumazet Eric Whitney Fedor Pchelkin Florian Fainelli Florian Westphal Gavrilov Ilia Greg Kroah-Hartman Guenter Roeck H.J. Lu Hangbin Liu Harry Wentland Harshit Mogalapalli Heiko Carstens Ilia.Gavrilov Jacob Pan Jakub Kicinski Jan Kara Jani Nikula Jens Axboe Joerg Roedel Johan Hovold John Harrison John Paul Adrian Glaubitz Jouni Högander Kang Chen Kim Phillips Laurent Pinchart Lee Jones Liguang Zhang Linux Kernel Functional Testing Lorenz Bauer Lorenz Bauer Lu Baolu Marc Zyngier
Re: [PATCH v4] x86/HVM: support emulated UMIP
On 17/03/2023 2:29 pm, Roger Pau Monné wrote: > On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote: >> There are three noteworthy drawbacks: >> 1) The intercepts we need to enable here are CPL-independent, i.e. we >>now have to emulate certain instructions for ring 0. >> 2) On VMX there's no intercept for SMSW, so the emulation isn't really >>complete there. > Then I'm afraid we can't set the bit in the max CPUID policy. What > about domains being migrated from a host that has UMIP to an Intel > host where UMIP is emulated? They would see a change in behavior in > SMSW, and the behavior won't match the ISA anymore. There are conflicting opinions on this. But the truth is that SMSW only leaks the bottom nibble(?) of CR0 and that simply isn't information that is of use to an attacker like SGDT/SIDT is. So from an entirely ideal point of view there is an argument to say that UMIP-but-can't-block-SMSW is better than no UMIP. Except, I'm not fully convinced by this argument. SMSW aside, emulating UMIP on hardware without it involves emulating the guest being able to set CR4.UMIP which is reserved so we have to intercept #UD, and intercepting all #GP so we can find the S{I,LG}DT/STR/SMSW(on AMD) instructions and fail them in Ring3. We went to a lot of effort to not intercept #UD (by default) because it exposed x86_emulate() to guest userspace and caused us a huge number of security headaches. Similarly, #GP interception is the source of a lot of security bugs on other hypervisors. So there is large security concern with this patch. Which is not a no, but definitely is a "need to think about this more carefully". This logic isn't useful for Linux. All versions of Linux which know about UMIP already put the IDT and GDT on read-only mappings to prevent SIDT/SGDT being useful to an attacker on hardware lacking UMIP. I don't know what Windows does here, but I would be amazed if they don't something similar. Therefore, this logic is only useful for guests which do know about UIMP, and do not have any other defences against SIDT/SGD. If this isn't an empty set of kernels, it will be a small set. Also, as a note, the XTF UMIP test declares a failure against KVM (because SMSW does leak), and will do the same on Xen after this patch. I don't think OSSTest will declare this to be a blocking regression, because the XTF test won't be configured for "max", so it won't notice. And because I still haven't got the test-version logic working, we can't modify the XTF UMIP behaviour without breaking the OSSTest bisector... ~Andrew
[PATCH] x86/APIC: modify error_interrupt() to output using single printk()
This takes care of the issue of APIC errors tending to occur on multiple cores at one. In turn this tends to causes the error messages to be merged together, making understanding them difficult. Signed-off-by: Elliott Mitchell --- xen/arch/x86/apic.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index f71474d47d..5399488120 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1412,6 +1412,12 @@ static void cf_check error_interrupt(struct cpu_user_regs *regs) }; unsigned int v, v1; int i; +char head[] = XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)"; +char entry[] = ", %s"; +/* header string, ", %s" for each potential entry, newline */ +char fmt[sizeof(head) + ARRAY_SIZE(esr_fields) * (sizeof(entry) - 1) + 1]; +const char *args[ARRAY_SIZE(esr_fields)]; +unsigned int count = 0; /* First tickle the hardware, only then report what went on. -- REW */ v = apic_read(APIC_ESR); @@ -1419,12 +1425,18 @@ static void cf_check error_interrupt(struct cpu_user_regs *regs) v1 = apic_read(APIC_ESR); ack_APIC_irq(); -printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)", -smp_processor_id(), v , v1); +memcpy(fmt, head, sizeof(head) - 1); + for ( i = 7; i >= 0; --i ) -if ( v1 & (1 << i) ) -printk(", %s", esr_fields[i]); -printk("\n"); +if ( v1 & (1 << i) ) { +memcpy(fmt + sizeof(head) + (sizeof(entry) - 1) * count - 1, entry, +sizeof(entry) - 1); +args[count++] = esr_fields[i]; +} +memcpy(fmt + sizeof(head) + (sizeof(entry) - 1) * count - 1, "\n", 2); +/*_Static_assert(ARRAY_SIZE(args) == 8, "printk() here needs args added");*/ +printk(fmt, smp_processor_id(), v , v1, args[0], args[1], args[2], args[3], +args[4], args[5], args[6], args[7]); } /* -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (|ehem+sig...@m5p.comPGP 87145445|) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
[linux-linus test] 179702: regressions - trouble: fail/pass/starved
flight 179702 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/179702/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-freebsd12-amd64 13 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-credit1 19 guest-saverestore.2 fail REGR. vs. 178042 test-amd64-amd64-xl-shadow 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-xsm 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-freebsd11-amd64 13 guest-start fail REGR. vs. 178042 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-multivcpu 20 guest-localmigrate/x10 fail REGR. vs. 178042 test-amd64-amd64-libvirt 18 guest-saverestore.2 fail REGR. vs. 178042 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 178042 test-amd64-amd64-libvirt-xsm 18 guest-saverestore.2 fail REGR. vs. 178042 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 178042 test-arm64-arm64-xl-thunderx 14 guest-start fail REGR. vs. 178042 test-arm64-arm64-xl-xsm 17 guest-stop fail REGR. vs. 178042 test-arm64-arm64-xl-credit2 14 guest-start fail REGR. vs. 178042 test-arm64-arm64-libvirt-xsm 14 guest-start fail REGR. vs. 178042 test-arm64-arm64-xl 18 guest-start/debian.repeat fail REGR. vs. 178042 test-amd64-amd64-xl-credit2 22 guest-start/debian.repeat fail REGR. vs. 178042 test-arm64-arm64-xl-credit1 18 guest-start/debian.repeat fail REGR. vs. 178042 test-amd64-amd64-qemuu-nested-intel 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 18 guest-localmigrate/x10 fail REGR. vs. 178042 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-pair25 guest-start/debian fail REGR. vs. 178042 test-amd64-amd64-qemuu-nested-amd 12 debian-hvm-install fail REGR. vs. 178042 test-amd64-amd64-xl-pvshim 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-pvhv2-intel 14 guest-start fail REGR. vs. 178042 test-amd64-amd64-xl-vhd 12 debian-di-installfail REGR. vs. 178042 test-amd64-amd64-pygrub 12 debian-di-installfail REGR. vs. 178042 test-amd64-amd64-libvirt-raw 12 debian-di-installfail REGR. vs. 178042 test-amd64-amd64-libvirt-qcow2 12 debian-di-install fail REGR. vs. 178042 test-arm64-arm64-xl-vhd 12 debian-di-installfail REGR. vs. 178042 test-arm64-arm64-libvirt-raw 12 debian-di-installfail REGR. vs. 178042 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 18 guest-localmigrate fail REGR. vs. 178042 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 178042 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 178042 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 178042 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 178042 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-examine 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-a
Re: [BUG] x2apic broken with current AMD hardware
On Fri, Mar 17, 2023 at 09:22:09AM +0100, Jan Beulich wrote: > On 16.03.2023 23:03, Elliott Mitchell wrote: > > On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote: > >> On 11.03.2023 01:09, Elliott Mitchell wrote: > >>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote: > > In any event you will want to collect a serial log at maximum verbosity. > It would also be of interest to know whether turning off the IOMMU avoids > the issue as well (on the assumption that your system has less than 255 > CPUs). > >>> > >>> I think I might have figured out the situation in a different fashion. > >>> > >>> I was taking a look at the BIOS manual for this motherboard and noticed > >>> a mention of a "Local APIC Mode" setting. Four values are listed > >>> "Compatibility", "xAPIC", "x2APIC", and "Auto". > >>> > >>> That is the sort of setting I likely left at "Auto" and that may well > >>> result in x2 functionality being disabled. Perhaps the x2APIC > >>> functionality on AMD is detecting whether the hardware is present, and > >>> failing to test whether it has been enabled? (could be useful to output > >>> a message suggesting enabling the hardware feature) > >> > >> Can we please move to a little more technical terms here? What is "present" > >> and "enabled" in your view? I don't suppose you mean the CPUID bit (which > >> we check) and the x2APIC-mode-enable one (which we drive as needed). It's > >> also left unclear what the four modes of BIOS operation evaluate to. Even > >> if we knew that, overriding e.g. "Compatibility" (which likely means some > >> form of "disabled" / "hidden") isn't normally an appropriate thing to do. > >> In "Auto" mode Xen likely should work - the only way I could interpret the > >> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and > >> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre- > >> enabled by firmware, and "Auto" leaving it to the OS to select. Yet that's > >> speculation on my part ... > > > > I provided the information I had discovered. There is a setting for this > > motherboard (likely present on some similar motherboards) which /may/ > > effect the issue. I doubt I've tried "compatibility", but none of the > > values I've tried have gotten the system to boot without "x2apic=false" > > on Xen's command-line. > > > > When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended Features:" > > I see the line "(XEN) - x2APIC". Later is the line > > "(XEN) x2APIC mode is already enabled by BIOS." I'll guess "Auto" > > leaves the x2APIC turned off since neither line is present. > > When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC > mode. Are you sure that's the case when using "Auto"? grep -eAPIC\ driver -e-\ x2APIC: "Auto": (XEN) Using APIC driver default (XEN) Overriding APIC driver with bigsmp (XEN) Switched to APIC driver x2apic_cluster "x2APIC": (XEN) Using APIC driver x2apic_cluster (XEN) - x2APIC Yes, I'm sure. > > Both cases the line "(XEN) Switched to APIC driver x2apic_cluster" is > > present (so perhaps "Auto" merely doesn't activate it). > > Did you also try "x2apic_phys" on the Xen command line (just to be sure > this isn't a clustered-mode only issue)? No. In fact x2apic_cluster is mentioned in all failure cases. > > Appears error_interrupt() needs locking or some concurrency handling > > mechanism since the last error is jumbled. With the setting "x2APIC" > > I get a bunch of: > > "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)" > > (apparently one for each core) > > Followed by "Receive accept error, Receive accept error," (again, > > apparently one for each core). Then a bunch of newlines (same pattern). > > This is a known issue, but since the messages shouldn't appear in the > first place so far no-one has bothered to address this. I won't claim it is the best solution, but see other message. I'm tempted to propose allowing _Static_assert() since it is valuable functionality for preventing issues. > > With the setting "auto" the last message is a series of > > "(XEN) CPU#: No irq handler for vector ## (IRQ -2147483648, LAPIC)" on > > 2 different cores. Rather more of the lines were from one core, the > > vector value varied some. > > > > Notable both sets of final error messages appeared after the Domain 0 > > kernel thought it had been operating for >30 seconds. Lack of > > response to interrupt generating events (pressing keys on USB keyboard) > > suggests interrupts weren't getting through. > > > > > > With "x2apic=false" error messages similar to the "Local APIC Mode" > > of "x2APIC" appear >45 seconds after Domain 0 kernel start. Of note > > first "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)" > > appears for all cores with "Receive accept error,". > > > > Yet later a variation on this message starts appearing: > > "(XEN) APIC error on CPU#: 08(08)(XEN) APIC error on CPU#: 08(0
Re: [GIT PULL] xen: branch for v6.3-rc3
The pull request you sent on Fri, 17 Mar 2023 07:40:23 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git > for-linus-6.3-rc3-tag has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/0eb392ec095ee29b932985deefb43e6d86d8a463 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
[libvirt test] 179706: tolerable trouble: pass/starved - PUSHED
flight 179706 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/179706/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass build-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: libvirt 5589a3e1f30ab6b9b3247b98f1a6697b0578923a baseline version: libvirt 8386242bd0f6c1cb242f9c711e2ef864bf114d0d Last test of basis 179640 2023-03-15 04:19:43 Z2 days Failing since179665 2023-03-16 04:18:51 Z1 days2 attempts Testing same since 179706 2023-03-17 04:18:50 Z0 days1 attempts People who touched revisions under test: Jiri Denemark Ján Tomko Michal Privoznik Or Ozeri Tim Wiederhake jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf starved build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt starved build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt starved test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-qcow2 starved test-arm64-arm64-libvirt-raw pass test-armhf-armhf-libvirt-raw starved test-amd64-i386-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports,
Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > >> So yes, it then all boils down to that Linux- > > >> internal question. > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > access to the hardware. But I have this data I can share in the > > > meantime: > > > > > > [1.260378] IRQ to pin mappings: > > > [1.260387] IRQ1 -> 0:1 > > > [1.260395] IRQ2 -> 0:2 > > > [1.260403] IRQ3 -> 0:3 > > > [1.260410] IRQ4 -> 0:4 > > > [1.260418] IRQ5 -> 0:5 > > > [1.260425] IRQ6 -> 0:6 > > > [1.260432] IRQ7 -> 0:7 > > > [1.260440] IRQ8 -> 0:8 > > > [1.260447] IRQ9 -> 0:9 > > > [1.260455] IRQ10 -> 0:10 > > > [1.260462] IRQ11 -> 0:11 > > > [1.260470] IRQ12 -> 0:12 > > > [1.260478] IRQ13 -> 0:13 > > > [1.260485] IRQ14 -> 0:14 > > > [1.260493] IRQ15 -> 0:15 > > > [1.260505] IRQ106 -> 1:8 > > > [1.260513] IRQ112 -> 1:4 > > > [1.260521] IRQ116 -> 1:13 > > > [1.260529] IRQ117 -> 1:14 > > > [1.260537] IRQ118 -> 1:15 > > > [1.260544] done. > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > Linux running baremetal on the same hardware? > > So I have some emails from Ray from he time he was looking into this, > and on Linux dom0 PVH dmesg there is: > > [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 0-23 > [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI > 24-55 > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > consistent. > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > 112 (which is the one causing issues): > > > > > > __acpi_register_gsi-> > > > acpi_register_gsi_ioapic-> > > > mp_map_gsi_to_irq-> > > > mp_map_pin_to_irq-> > > > __irq_resolve_mapping() > > > > > > if (likely(data)) { > > > desc = irq_data_to_desc(data); > > > if (irq) > > > *irq = data->irq; > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > } > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > FWIW, the introduced function in that patch > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > acpi_register_gsi_ioapic() without checking if the GSI is already > registered, which might lead to multiple IRQs being allocated for the > same underlying GSI? I understand this point and I think it needs investigating. > As I commented there, I think that approach is wrong. If the GSI has > not been mapped in Xen (because dom0 hasn't unmasked the respective > IO-APIC pin) we should add some logic in the toolstack to map it > before attempting to bind. But this statement confuses me. The toolstack doesn't get involved in IRQ setup for PCI devices for HVM guests? Keep in mind that this is a regular HVM guest creation on PVH Dom0, so normally the IRQ setup is done by QEMU, and QEMU already calls xc_physdev_map_pirq and xc_domain_bind_pt_pci_irq. So I don't follow your statement about "the toolstack to map it before attempting to bind".
Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > >> So yes, it then all boils down to that Linux- > > > >> internal question. > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > access to the hardware. But I have this data I can share in the > > > > meantime: > > > > > > > > [1.260378] IRQ to pin mappings: > > > > [1.260387] IRQ1 -> 0:1 > > > > [1.260395] IRQ2 -> 0:2 > > > > [1.260403] IRQ3 -> 0:3 > > > > [1.260410] IRQ4 -> 0:4 > > > > [1.260418] IRQ5 -> 0:5 > > > > [1.260425] IRQ6 -> 0:6 > > > > [1.260432] IRQ7 -> 0:7 > > > > [1.260440] IRQ8 -> 0:8 > > > > [1.260447] IRQ9 -> 0:9 > > > > [1.260455] IRQ10 -> 0:10 > > > > [1.260462] IRQ11 -> 0:11 > > > > [1.260470] IRQ12 -> 0:12 > > > > [1.260478] IRQ13 -> 0:13 > > > > [1.260485] IRQ14 -> 0:14 > > > > [1.260493] IRQ15 -> 0:15 > > > > [1.260505] IRQ106 -> 1:8 > > > > [1.260513] IRQ112 -> 1:4 > > > > [1.260521] IRQ116 -> 1:13 > > > > [1.260529] IRQ117 -> 1:14 > > > > [1.260537] IRQ118 -> 1:15 > > > > [1.260544] done. > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > Linux running baremetal on the same hardware? > > > > So I have some emails from Ray from he time he was looking into this, > > and on Linux dom0 PVH dmesg there is: > > > > [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI > > 0-23 > > [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI > > 24-55 > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > consistent. > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > 112 (which is the one causing issues): > > > > > > > > __acpi_register_gsi-> > > > > acpi_register_gsi_ioapic-> > > > > mp_map_gsi_to_irq-> > > > > mp_map_pin_to_irq-> > > > > __irq_resolve_mapping() > > > > > > > > if (likely(data)) { > > > > desc = irq_data_to_desc(data); > > > > if (irq) > > > > *irq = data->irq; > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > } > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > FWIW, the introduced function in that patch > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > acpi_register_gsi_ioapic() without checking if the GSI is already > > registered, which might lead to multiple IRQs being allocated for the > > same underlying GSI? > > I understand this point and I think it needs investigating. > > > > As I commented there, I think that approach is wrong. If the GSI has > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > IO-APIC pin) we should add some logic in the toolstack to map it > > before attempting to bind. > > But this statement confuses me. The toolstack doesn't get involved in > IRQ setup for PCI devices for HVM guests? It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that cold be removed (maybe for qemu-trad only?) or it's also required by QEMU upstream, I would have to investigate more. It's my understanding it's in pci_add_dm_done() where Ray was getting the mismatched IRQ vs GSI number. Thanks, Roger.
[PATCH v2 0/2] Fixing error_interrupt()'s messages v2
Using a somewhat distinct strategy with the printk(). This is two patches, since nominally the first can work with other strategies. I could see them being squashed during commit to the main repository. Elliott Mitchell (2): x86/APIC: include full string with error_interrupt() error messages x86/APIC: modify error_interrupt() to output using single printk() xen/arch/x86/apic.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@drgnwing.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
[PATCH v2 1/2] x86/APIC: include full string with error_interrupt() error messages
Rather than adding ", " with each printf(), simply include them in the string initially. Signed-off-by: Elliott Mitchell --- xen/arch/x86/apic.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index f71474d47d..8cfb8cd71c 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1401,14 +1401,14 @@ static void cf_check spurious_interrupt(struct cpu_user_regs *regs) static void cf_check error_interrupt(struct cpu_user_regs *regs) { static const char *const esr_fields[] = { -"Send CS error", -"Receive CS error", -"Send accept error", -"Receive accept error", -"Redirectable IPI", -"Send illegal vector", -"Received illegal vector", -"Illegal register address", +", Send CS error", +", Receive CS error", +", Send accept error", +", Receive accept error", +", Redirectable IPI", +", Send illegal vector", +", Received illegal vector", +", Illegal register address", }; unsigned int v, v1; int i; @@ -1423,7 +1423,7 @@ static void cf_check error_interrupt(struct cpu_user_regs *regs) smp_processor_id(), v , v1); for ( i = 7; i >= 0; --i ) if ( v1 & (1 << i) ) -printk(", %s", esr_fields[i]); +printk("%s", esr_fields[i]); printk("\n"); } -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@drgnwing.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
[PATCH v2 2/2] x86/APIC: modify error_interrupt() to output using single printk()
This takes care of the issue of APIC errors tending to occur on multiple cores at one. In turn this tends to causes the error messages to be merged together, making understanding them difficult. Signed-off-by: Elliott Mitchell --- xen/arch/x86/apic.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index 8cfb8cd71c..89abd1ea51 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1410,6 +1410,7 @@ static void cf_check error_interrupt(struct cpu_user_regs *regs) ", Received illegal vector", ", Illegal register address", }; +const char *entries[ARRAY_SIZE(esr_fields)]; unsigned int v, v1; int i; @@ -1419,12 +1420,12 @@ static void cf_check error_interrupt(struct cpu_user_regs *regs) v1 = apic_read(APIC_ESR); ack_APIC_irq(); -printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)", -smp_processor_id(), v , v1); for ( i = 7; i >= 0; --i ) -if ( v1 & (1 << i) ) -printk("%s", esr_fields[i]); -printk("\n"); +entries[i] = v1 & (1 << i) ? esr_fields[i] : ""; +printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)" +"%s%s%s%s%s%s%s%s" "\n", +smp_processor_id(), v , v1, entries[0], entries[1], entries[2], +entries[3], entries[4], entries[5], entries[6], entries[7]); } /* -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@drgnwing.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi
On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > >> So yes, it then all boils down to that Linux- > > > > >> internal question. > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one > > > > > with > > > > > access to the hardware. But I have this data I can share in the > > > > > meantime: > > > > > > > > > > [1.260378] IRQ to pin mappings: > > > > > [1.260387] IRQ1 -> 0:1 > > > > > [1.260395] IRQ2 -> 0:2 > > > > > [1.260403] IRQ3 -> 0:3 > > > > > [1.260410] IRQ4 -> 0:4 > > > > > [1.260418] IRQ5 -> 0:5 > > > > > [1.260425] IRQ6 -> 0:6 > > > > > [1.260432] IRQ7 -> 0:7 > > > > > [1.260440] IRQ8 -> 0:8 > > > > > [1.260447] IRQ9 -> 0:9 > > > > > [1.260455] IRQ10 -> 0:10 > > > > > [1.260462] IRQ11 -> 0:11 > > > > > [1.260470] IRQ12 -> 0:12 > > > > > [1.260478] IRQ13 -> 0:13 > > > > > [1.260485] IRQ14 -> 0:14 > > > > > [1.260493] IRQ15 -> 0:15 > > > > > [1.260505] IRQ106 -> 1:8 > > > > > [1.260513] IRQ112 -> 1:4 > > > > > [1.260521] IRQ116 -> 1:13 > > > > > [1.260529] IRQ117 -> 1:14 > > > > > [1.260537] IRQ118 -> 1:15 > > > > > [1.260544] done. > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > Linux running baremetal on the same hardware? > > > > > > So I have some emails from Ray from he time he was looking into this, > > > and on Linux dom0 PVH dmesg there is: > > > > > > [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI > > > 0-23 > > > [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI > > > 24-55 > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > consistent. > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ > > > > > == > > > > > 112 (which is the one causing issues): > > > > > > > > > > __acpi_register_gsi-> > > > > > acpi_register_gsi_ioapic-> > > > > > mp_map_gsi_to_irq-> > > > > > mp_map_pin_to_irq-> > > > > > __irq_resolve_mapping() > > > > > > > > > > if (likely(data)) { > > > > > desc = irq_data_to_desc(data); > > > > > if (irq) > > > > > *irq = data->irq; > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > } > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > FWIW, the introduced function in that patch > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > registered, which might lead to multiple IRQs being allocated for the > > > same underlying GSI? > > > > I understand this point and I think it needs investigating. > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > before attempting to bind. > > > > But this statement confuses me. The toolstack doesn't get involved in > > IRQ setup for PCI devices for HVM guests? > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > cold be removed (maybe for qemu-trad only?) or it's also required by > QEMU upstream, I would have to investigate more. You are right. I am not certain, but it seems like a mistake in the toolstack to me. In theory, pci_add_dm_done should only be needed for PV guests, not for HVM guests. I am not sure. But I can see the call to xc_physdev_map_pirq you were referring to now. > It's my understanding it's in pci_add_dm_done() where Ray was getting > the mismatched IRQ vs GSI number. I think the mismatch was actually caused by the xc_physdev_map_pirq call from QEMU, which makes sense because in any case it should happen before the same call done by pci_add_dm_done (pci_add_dm_done is called after sending the pci passthrough QMP command to QEMU). So the first to hit the IRQ!=GSI problem would be QEMU.
[qemu-mainline test] 179708: regressions - trouble: fail/pass/starved
flight 179708 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/179708/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-libvirt 14 guest-start fail REGR. vs. 179518 test-amd64-i386-libvirt-xsm 14 guest-start fail REGR. vs. 179518 test-amd64-i386-libvirt-pair 25 guest-start/debian fail REGR. vs. 179518 test-amd64-amd64-xl-qcow212 debian-di-installfail REGR. vs. 179518 test-amd64-amd64-libvirt-vhd 12 debian-di-installfail REGR. vs. 179518 test-amd64-amd64-libvirt-xsm 14 guest-start fail REGR. vs. 179518 test-arm64-arm64-libvirt-xsm 14 guest-start fail REGR. vs. 179518 test-amd64-i386-libvirt-raw 12 debian-di-installfail REGR. vs. 179518 test-amd64-amd64-libvirt 14 guest-start fail REGR. vs. 179518 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install fail REGR. vs. 179518 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install fail REGR. vs. 179518 test-arm64-arm64-libvirt-raw 12 debian-di-installfail REGR. vs. 179518 test-amd64-i386-xl-vhd 12 debian-di-installfail REGR. vs. 179518 test-arm64-arm64-xl-vhd 12 debian-di-installfail REGR. vs. 179518 test-amd64-amd64-libvirt-pair 25 guest-start/debian fail REGR. vs. 179518 Tests which are failing intermittently (not blocking): test-amd64-i386-pair 10 xen-install/src_host fail in 179682 pass in 179708 test-amd64-i386-pair 11 xen-install/dst_host fail in 179682 pass in 179708 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-install fail in 179682 pass in 179708 test-amd64-i386-freebsd10-i386 7 xen-install fail pass in 179682 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179518 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179518 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179518 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179518 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179518 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-xl 1 build-check(1) starved n/a test-armhf-armhf-xl-credit1 1 build-check(1) starved n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) starved n/a test-armhf-armhf-xl-rtds 1 build-check(1) starved n/a test-armhf-armhf-xl-vhd 1 build-check(1) starved n/a build-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-xl-credit2 1 build-check(1) starved n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: qemuu9636e513255362c4a329e3e5fb2c97dab3c5ce47 baseline version: qemuu7b0f0aa55fd292fa3489755a3a896e496c51ea86 Last test of basis 179518 2023-03-09 10:37:19 Z8 days Failing since179526 2023-03-10 01:53:40 Z7 days 13 attempts Testing same since 179682 2023-03-16 13:21:44 Z1 days2 attempts People who touched revisions under test: Akihiko Odaki Albert Esteve Alex Bennée Alex Williamson Alistair Francis Andreas Schwab Anton Johansson Avihai Horon BALATON Zoltan Bernhard Beschow Carlos López Cédric Le Goater Cédric Le Goater Damien Hedde Daniel P. Berran
Re: [PATCH 1/2] automation: build 6.1.19 kernel for x86-64 dom0
On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote: > It will be used in tests added in subsequent patches. > Enable config options needed for those tests. Thanks for the patch! Looks great. Can you also remove the old 5.10.74 Dockerfile (automation/tests-artifacts/kernel/5.10.74.dockerfile), the kernel-5.10.74-export job as well and replace kernel-5.10.74-export with kernel-6.1.19-export as dependency for the qemu-alpine-x86_64-gcc test job? I think it makes sense to just keep kernel-6.1.19-export. I tested the below already and it works fine. diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index 2e1a6886df..f28c01fe0e 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -269,7 +269,7 @@ qemu-alpine-x86_64-gcc: needs: - alpine-3.12-gcc - alpine-3.12-rootfs-export -- kernel-5.10.74-export +- kernel-6.1.19-export > Signed-off-by: Marek Marczykowski-Górecki > --- > automation/gitlab-ci/build.yaml | 11 - > automation/tests-artifacts/kernel/6.1.19.dockerfile | 40 ++- > 2 files changed, 51 insertions(+) > create mode 100644 automation/tests-artifacts/kernel/6.1.19.dockerfile > > diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml > index 38bb22d8609b..e1799d454c76 100644 > --- a/automation/gitlab-ci/build.yaml > +++ b/automation/gitlab-ci/build.yaml > @@ -790,3 +790,14 @@ kernel-5.10.74-export: >- binaries/bzImage >tags: > - x86_64 > + > +kernel-6.1.19-export: > + extends: .test-jobs-artifact-common > + image: registry.gitlab.com/xen-project/xen/tests-artifacts/kernel:6.1.19 > + script: > +- mkdir binaries && cp /bzImage binaries/bzImage > + artifacts: > +paths: > + - binaries/bzImage > + tags: > +- x86_64 > diff --git a/automation/tests-artifacts/kernel/6.1.19.dockerfile > b/automation/tests-artifacts/kernel/6.1.19.dockerfile > new file mode 100644 > index ..c2171555a0a6 > --- /dev/null > +++ b/automation/tests-artifacts/kernel/6.1.19.dockerfile > @@ -0,0 +1,40 @@ > +FROM debian:unstable > +LABEL maintainer.name="The Xen Project" \ > + maintainer.email="xen-devel@lists.xenproject.org" > + > +ENV DEBIAN_FRONTEND=noninteractive > +ENV LINUX_VERSION=6.1.19 > +ENV USER root > + > +RUN mkdir /build > +WORKDIR /build > + > +# build depends > +RUN apt-get update && \ > +apt-get --quiet --yes install \ > +build-essential \ > +libssl-dev \ > +bc \ > +curl \ > +flex \ > +bison \ > +libelf-dev \ > +&& \ > +apt-get autoremove -y && \ > +apt-get clean && \ > +rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/* > + > +# Build the kernel > +RUN curl -fsSLO > https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-"$LINUX_VERSION".tar.xz && > \ > +tar xvJf linux-"$LINUX_VERSION".tar.xz && \ > +cd linux-"$LINUX_VERSION" && \ > +make defconfig && \ > +make xen.config && \ > +scripts/config --enable BRIDGE && \ > +scripts/config --enable IGC && \ > +cp .config .config.orig && \ > +cat .config.orig | grep XEN | grep =m |sed 's/=m/=y/g' >> .config && \ > +make -j$(nproc) bzImage && \ > +cp arch/x86/boot/bzImage / && \ > +cd /build && \ > +rm -rf linux-"$LINUX_VERSION"* > -- > git-series 0.9.1 >
[ovmf test] 179730: all pass - PUSHED
flight 179730 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/179730/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf b17a3a133b18fb41493fba7d86e9b5804ea6a8cf baseline version: ovmf 410ca0ff94a42ee541dd6ceab70ea974eeb7e500 Last test of basis 179713 2023-03-17 08:15:15 Z0 days Testing same since 179730 2023-03-17 18:22:13 Z0 days1 attempts People who touched revisions under test: Rebecca Cran jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 410ca0ff94..b17a3a133b b17a3a133b18fb41493fba7d86e9b5804ea6a8cf -> xen-tested-master
Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system
On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote: > This is a first test using Qubes OS CI infra. The gitlab-runner has > access to ssh-based control interface (control@thor.testnet, ssh key > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot > files (mapped under /scratch/gitlab-runner/tftp inside the container). > Details about the setup are described on > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/ > > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop > system. The test script is based on the Xilinx one. > > The machine needs newer kernel than other x86 tests run, so use 6.1.x > kernel added in previous commit. > > When building rootfs, use fakeroot to preserve device files when > repacking rootfs archives in a non-privileged container. > > Signed-off-by: Marek Marczykowski-Górecki This is awesome!! Thanks Marek! > --- > I'm bad at naming things. Things that I could use naming suggestions: > - test script (qubes-x86-64-suspend.sh) - this might be okay? > - test template job name (.adl-x86-64) > - test job name (adl-suspend-x86-64-gcc) > - tag (qubes-hw2) I think these names are OK. I would maybe rename the tag "qubes-hw2" to "qubes" because so far there is only one but I am fine with qubes-hw2 also. > For context, our CI has several machines, named test-X or hwX, each > controlled with matching hal900X RPi (this is where gitlab-runner is). > This test uses only one specific hw, but I plan adding few others too. > --- > automation/gitlab-ci/test.yaml | 31 - > automation/scripts/qubes-x86-64-suspend.sh | 155 ++- > 2 files changed, 186 insertions(+) > create mode 100755 automation/scripts/qubes-x86-64-suspend.sh > > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml > index 2e1a6886df7f..f5511dd6da70 100644 > --- a/automation/gitlab-ci/test.yaml > +++ b/automation/gitlab-ci/test.yaml > @@ -15,6 +15,10 @@ > .arm32-test-needs: &arm32-test-needs >- qemu-system-aarch64-6.0.0-arm32-export > > +.x86-64-test-needs: &x86-64-test-needs > + - alpine-3.12-rootfs-export > + - kernel-6.1.19-export As you know, the jobs starting with a "." are template jobs to avoid repeating the same things over and over. .x86-64-test-needs could be used in qemu-alpine-x86_64-gcc also. > .qemu-arm64: >extends: .test-jobs-common >variables: > @@ -84,6 +88,25 @@ >tags: > - xilinx > > +.adl-x86-64: > + extends: .test-jobs-common > + variables: > +# the test controller runs on RPi4 > +CONTAINER: alpine:3.12-arm64v8 > +LOGFILE: smoke-test.log > + artifacts: > +paths: > + - smoke.serial > + - '*.log' > +when: always > + only: > +variables: > + - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true" Let me know which trees should have QUBES_JOBS set to true (thus able to start Qubes jobs.) At a minimum, I think we would want https://gitlab.com/xen-project/xen to test "staging" and "master". I can set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are OK with it. > + before_script: > +- apk --no-cache add openssh-client fakeroot This would work but we typically try to avoid installing more packages in the test jobs for speed and efficiency. Instead, add those packages directly to the build container dockerfile (automation/build/alpine/3.12-arm64v8.dockerfile), e.g.: diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile b/automation/build/alpine/3.12-arm64v8.dockerfile index 180c978964..3f1e6a3fc6 100644 --- a/automation/build/alpine/3.12-arm64v8.dockerfile +++ b/automation/build/alpine/3.12-arm64v8.dockerfile @@ -41,3 +41,6 @@ RUN apk --no-cache add \ libattr \ libcap-ng-dev \ pixman-dev \ + # qubes test deps + openssh-client \ + fakeroot \ But I am not sure why you need fakeroot, more questions below about it > + tags: > +- qubes-hw2 > + > # Test jobs > build-each-commit-gcc: >extends: .test-jobs-common > @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc: > - *arm64-test-needs > - alpine-3.12-gcc-arm64 > > +adl-suspend-x86-64-gcc: > + extends: .adl-x86-64 > + script: > +- ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE} > + needs: > +- *x86-64-test-needs > +- alpine-3.12-gcc This up to you, but qubes-x86-64-suspend.sh is very cool and sophisticated and I would use it to create at least 2 jobs: - a plain dom0 + domU boot (no suspend) - a suspend/resume job I am happy either way > qemu-smoke-dom0-arm64-gcc: >extends: .qemu-arm64 >script: > diff --git a/automation/scripts/qubes-x86-64-suspend.sh > b/automation/scripts/qubes-x86-64-suspend.sh > new file mode 100755 > index ..fc479c9faaec > --- /dev/null > +++ b/automation/scripts/qubes-x86-64-suspend.sh > @@ -0,0 +1,155 @@ > +#!/bin/sh > + > +set -ex > + > +test_variant=$1 > + > +wait_and_wakeup= > +if [ -z "${te
Re: [RFC QEMU PATCH 12/18] softmmu: Fix the size to map cache with xen for host virtual address
On Sun, 12 Mar 2023, Huang Rui wrote: > The xen_map_cache function wants to pass offset and size of this memory > block as the input parameters to map the host virtual address. However, > block->offset is too large as 0x1 (4G), if we assign the size as > block->max_length (0x11000), the mapped host address will be out of > block->max_length and easy to overflow. Hi Ray, Is this patch still required after all the other fixes? If it is required, where is the overflow that it is trying to prevent? Is it a failure in the hypercall mapping the memory to QEMU (hw/i386/xen/xen-mapcache.c:xen_remap_bucket) ? > We have to assign the size as > (block->max_length - block->offset), then that is able to ensure the > address will be located in legal range inside of max_length. > > {rcu = {next = 0x0, func = 0x0}, mr = 0x5681b620, host = 0x0, > colo_cache = 0x0, offset = 0x1, used_length = 0x11000, > max_length = 0x11000, resized = 0x0, flags = 0x10, idstr = {0x78, > 0x65, 0x6e, 0x2e, 0x72, 0x61, 0x6d, 0x0 }, next = { > le_next = 0x568c61b0, le_prev = 0x5681c640}, > ramblock_notifiers = {lh_first = 0x0}, fd = 0x, page_size = > 0x1000, bmap = 0x0, receivedmap = 0x0, clear_bmap = 0x0, > clear_bmap_shift = 0x0, postcopy_length = 0x0} > > Signed-off-by: Huang Rui > --- > softmmu/physmem.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c > index 1b606a3002..1b0bb35da9 100644 > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -2304,7 +2304,7 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t > addr) > return xen_map_cache(addr, 0, 0, false); > } > > -block->host = xen_map_cache(block->offset, block->max_length, 1, > false); > + block->host = xen_map_cache(block->offset, block->max_length, 1, false); Coding style: indentation is 4 spaces. In any case, this looks like a spurious change? > } > return ramblock_ptr(block, addr); > } > @@ -2337,7 +2337,8 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, > ram_addr_t addr, > return xen_map_cache(addr, *size, lock, lock); > } > > -block->host = xen_map_cache(block->offset, block->max_length, 1, > lock); > + block->host = xen_map_cache(block->offset, > + block->max_length - block->offset, 1, lock); > } > return ramblock_ptr(block, addr); block->offset is the address of the beginning of the block, and block->max_length is the size. Here the behavior is theoretically correct: if block->host is not set (not mapped in QEMU yet), then call xen_map_cache to map the entire block from beginning to end, setting block->host with a pointer to the beginning of the mapped area in QEMU. >From that point onward, ramblock_ptr() will then behave correctly. Of course if xen_map_cache fails to map the entire region at once because it is too large or other error, then we have a big problem. But I think in that case this patch would still cause issues. In this example offset (start of the ramblock) is 0x1, and max_length (size of the ramblock) is 0x11000. So with this change we are mapping 0x11000-0x1 = 0x1000 which is only the first 256MB of the region which is more than 4GB. What happens the next time qemu_ram_ptr_length is called for an address above the first 256MB? It will break because block->host != NULL so the function will behave as if the entire ramblock is mapped in QEMU while it is not (only the first 256MB are). ramblock_ptr will return block->host + something-more-than-256MB which is actually invalid. I think we would need more something along this line where we fall back to temporary mappings of a smaller region if we can't map it all at once. MAX_SIZE would be the max size where a single mapping still succeeds, maybe 4GB? if (block->offset == 0 || block->max_length > MAX_SIZE) { return xen_map_cache(addr, *size, lock, lock); } Otherwise, maybe the error could be due to max_length being incorrect to begin with?
Re: [RFC QEMU PATCH 13/18] hw/i386/xen/xen-hvm: Introduce xen_ram_block_check function
On Sun, 12 Mar 2023, Huang Rui wrote: > Introduce xen_ram_block_check function to check whether current ramblock > is xen ram memory. > > Signed-off-by: Huang Rui > --- > hw/i386/xen/xen-hvm.c | 15 +++ > include/hw/xen/xen.h | 1 + > 2 files changed, 16 insertions(+) > > diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c > index e4293d6d66..a4f12aefce 100644 > --- a/hw/i386/xen/xen-hvm.c > +++ b/hw/i386/xen/xen-hvm.c > @@ -32,6 +32,7 @@ > #include "sysemu/xen.h" > #include "sysemu/xen-mapcache.h" > #include "trace.h" > +#include "include/exec/ramblock.h" > > #include > #include > @@ -1564,6 +1565,20 @@ void xen_register_framebuffer(MemoryRegion *mr) > framebuffer = mr; > } > > +bool xen_ram_block_check(RAMBlock *rb) > +{ > + bool ret; > + > + if (!rb) > + return false; > + > + ret = (rb == ram_memory.ram_block); > + if (ret) > + rb->offset = 0; I take that this is needed because there is a ramblock that is ram_memory but with offset != 0? So it would fail the block->offset == 0 check in qemu_ram_ptr_length (which is meant to capture all accesses to ram_memory, but failing at it)? If so, would it be possible to just do this instead: diff --git a/softmmu/physmem.c b/softmmu/physmem.c index fb412a56e1..3e2640dabd 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -2149,7 +2149,7 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr, * because we don't want to map the entire memory in QEMU. * In that case just map the requested area. */ -if (block->offset == 0) { +if (block->offset == 0 || block == ram_memory.ram_block) { return xen_map_cache(addr, *size, lock, lock); } > + return ret; > +} > + > void xen_shutdown_fatal_error(const char *fmt, ...) > { > va_list ap; > diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h > index afdf9c436a..99a383eb17 100644 > --- a/include/hw/xen/xen.h > +++ b/include/hw/xen/xen.h > @@ -31,5 +31,6 @@ qemu_irq *xen_interrupt_controller_init(void); > void xenstore_store_pv_console_info(int i, Chardev *chr); > > void xen_register_framebuffer(struct MemoryRegion *mr); > +bool xen_ram_block_check(RAMBlock *rb); > > #endif /* QEMU_HW_XEN_H */ > -- > 2.25.1 >
Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system
On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote: > > This is a first test using Qubes OS CI infra. The gitlab-runner has > > access to ssh-based control interface (control@thor.testnet, ssh key > > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot > > files (mapped under /scratch/gitlab-runner/tftp inside the container). > > Details about the setup are described on > > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/ > > > > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop > > system. The test script is based on the Xilinx one. > > > > The machine needs newer kernel than other x86 tests run, so use 6.1.x > > kernel added in previous commit. > > > > When building rootfs, use fakeroot to preserve device files when > > repacking rootfs archives in a non-privileged container. > > > > Signed-off-by: Marek Marczykowski-Górecki > > This is awesome!! Thanks Marek! > > > > --- > > I'm bad at naming things. Things that I could use naming suggestions: > > - test script (qubes-x86-64-suspend.sh) - this might be okay? > > - test template job name (.adl-x86-64) > > - test job name (adl-suspend-x86-64-gcc) > > - tag (qubes-hw2) > > I think these names are OK. I would maybe rename the tag "qubes-hw2" to > "qubes" because so far there is only one but I am fine with qubes-hw2 > also. I have 10 of them (and growing), so I'd like to keep tag name at least somehow referencing which runner it uses. For example, this one is a desktop with Alder Lake, but some other tests I may want to use a laptop with Tiger Lake, for example. The mapping name -> hw spec isn't publicly written down (although our openQA publishes all kind of logs from them, so it's possible to infer this info). > > For context, our CI has several machines, named test-X or hwX, each > > controlled with matching hal900X RPi (this is where gitlab-runner is). > > This test uses only one specific hw, but I plan adding few others too. > > --- > > automation/gitlab-ci/test.yaml | 31 - > > automation/scripts/qubes-x86-64-suspend.sh | 155 ++- > > 2 files changed, 186 insertions(+) > > create mode 100755 automation/scripts/qubes-x86-64-suspend.sh > > > > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml > > index 2e1a6886df7f..f5511dd6da70 100644 > > --- a/automation/gitlab-ci/test.yaml > > +++ b/automation/gitlab-ci/test.yaml > > @@ -15,6 +15,10 @@ > > .arm32-test-needs: &arm32-test-needs > >- qemu-system-aarch64-6.0.0-arm32-export > > > > +.x86-64-test-needs: &x86-64-test-needs > > + - alpine-3.12-rootfs-export > > + - kernel-6.1.19-export > > As you know, the jobs starting with a "." are template jobs to avoid > repeating the same things over and over. .x86-64-test-needs could be > used in qemu-alpine-x86_64-gcc also. Right, when switching all of them to 6.1 kernel, that makes sense. > > .qemu-arm64: > >extends: .test-jobs-common > >variables: > > @@ -84,6 +88,25 @@ > >tags: > > - xilinx > > > > +.adl-x86-64: > > + extends: .test-jobs-common > > + variables: > > +# the test controller runs on RPi4 > > +CONTAINER: alpine:3.12-arm64v8 > > +LOGFILE: smoke-test.log > > + artifacts: > > +paths: > > + - smoke.serial > > + - '*.log' > > +when: always > > + only: > > +variables: > > + - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true" > > Let me know which trees should have QUBES_JOBS set to true (thus able to > start Qubes jobs.) At a minimum, I think we would want > https://gitlab.com/xen-project/xen to test "staging" and "master". I can > set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are > OK with it. Yes, that's perfectly okay. I'd like at least also staging-4.17, but depending on push frequency other staging-* are probably fine too (I very much doubt long queue would be an issue). Of course that assumes those tests would be backported, which I'm not sure if is planned. I'm also okay with allowing committers and/or other maintainers to use it on demand, but preferably not all patchew branches. BTW, if you trigger job manually via a web interface or API, you can specify variables for just a single pipeline. And if you use that feature, you can also make gitlab present you a bit more convenient interface with variables listed already. See example here: https://gitlab.com/QubesOS/qubes-continuous-integration/-/blob/main/.gitlab-ci.yml#L15-26 This could be useful to start only a subset of tests. > > + before_script: > > +- apk --no-cache add openssh-client fakeroot > > This would work but we typically try to avoid installing more packages > in the test jobs for speed and efficiency. Yes, I was being lazy and wanted to avoid rebuilding container (and changing from where it's pulled) when developing the test. Sure, will fold t
[xen-unstable test] 179717: tolerable trouble: fail/pass/starved
flight 179717 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/179717/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail in 179697 pass in 179717 test-amd64-i386-examine-uefi 6 xen-installfail pass in 179697 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-install fail pass in 179697 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail pass in 179697 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179697 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179697 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179697 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179697 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179697 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179697 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179697 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179697 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179697 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass build-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-examine 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-xl 1 build-check(1) starved n/a test-armhf-armhf-xl-credit1 1 build-check(1) starved n/a test-armhf-armhf-xl-credit2 1 build-check(1) starved n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) starved n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) starved n/a test-armhf-armhf-xl-rtds 1 build-check(1) starved n/a test-armhf-armhf-xl-vhd 1 build-check(1) starved n/a build-armhf 2 hosts-allocate starved n/a version targeted for testing: xen 36e49fc8cbad21a4308b4701caaa685ef04e120b baseline version: xen 36e49fc8cbad21a4308b4701caaa685ef04e120b Last test of basis 179717 2023-03-17 11:14:24 Z0 days Testing same since (not found) 0 attempts jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64
[linux-5.4 test] 179725: regressions - trouble: fail/pass/starved
flight 179725 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/179725/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail REGR. vs. 179598 Tests which are failing intermittently (not blocking): test-amd64-i386-pair 11 xen-install/dst_host fail pass in 179711 test-amd64-i386-xl-vhd 21 guest-start/debian.repeat fail pass in 179711 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179598 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179598 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179598 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179598 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179598 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179598 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179598 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179598 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179598 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl-vhd 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl-thunderx 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl-credit1 1 build-check(1) starved in 179711 n/a test-arm64-arm64-examine 1 build-check(1) starved in 179711 n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl-credit2 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl-xsm 1 build-check(1) starved in 179711 n/a test-arm64-arm64-xl 1 build-check(1) starved in 179711 n/a build-arm64-pvops 2 hosts-allocate starved in 179711 n/a build-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-examine 1 build-check(1) starved n/a test-armhf-armhf-libvirt 1 build-check(1) starved n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) starved n/a test-armhf-armhf-libvirt-raw 1 build-check(1) starved n/a test-armhf-armhf-xl 1 build-check(1) starved n/a test-armhf-armhf-xl-credit1 1 build-check(1) starved n/a test-armhf-armhf-xl-credit2 1 build-check(1) starved n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) starved n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) starved n/a test-armhf-armhf-xl-rtds 1 build-check(1) starved n/a test-armhf-armhf-xl-vhd 1 build-check(1) starved n/a b