Re: [BUG] x2apic broken with current AMD hardware

2023-03-17 Thread Jan Beulich
On 16.03.2023 23:03, Elliott Mitchell wrote:
> On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote:
>> On 11.03.2023 01:09, Elliott Mitchell wrote:
>>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote:

 In any event you will want to collect a serial log at maximum verbosity.
 It would also be of interest to know whether turning off the IOMMU avoids
 the issue as well (on the assumption that your system has less than 255
 CPUs).
>>>
>>> I think I might have figured out the situation in a different fashion.
>>>
>>> I was taking a look at the BIOS manual for this motherboard and noticed
>>> a mention of a "Local APIC Mode" setting.  Four values are listed
>>> "Compatibility", "xAPIC", "x2APIC", and "Auto".
>>>
>>> That is the sort of setting I likely left at "Auto" and that may well
>>> result in x2 functionality being disabled.  Perhaps the x2APIC
>>> functionality on AMD is detecting whether the hardware is present, and
>>> failing to test whether it has been enabled?  (could be useful to output
>>> a message suggesting enabling the hardware feature)
>>
>> Can we please move to a little more technical terms here? What is "present"
>> and "enabled" in your view? I don't suppose you mean the CPUID bit (which
>> we check) and the x2APIC-mode-enable one (which we drive as needed). It's
>> also left unclear what the four modes of BIOS operation evaluate to. Even
>> if we knew that, overriding e.g. "Compatibility" (which likely means some
>> form of "disabled" / "hidden") isn't normally an appropriate thing to do.
>> In "Auto" mode Xen likely should work - the only way I could interpret the
>> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and
>> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre-
>> enabled by firmware, and "Auto" leaving it to the OS to select. Yet that's
>> speculation on my part ...
> 
> I provided the information I had discovered.  There is a setting for this
> motherboard (likely present on some similar motherboards) which /may/
> effect the issue.  I doubt I've tried "compatibility", but none of the
> values I've tried have gotten the system to boot without "x2apic=false"
> on Xen's command-line.
> 
> When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended Features:"
> I see the line "(XEN) - x2APIC".  Later is the line
> "(XEN) x2APIC mode is already enabled by BIOS."  I'll guess "Auto"
> leaves the x2APIC turned off since neither line is present.

When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC
mode. Are you sure that's the case when using "Auto"?

> Both cases the line "(XEN) Switched to APIC driver x2apic_cluster" is
> present (so perhaps "Auto" merely doesn't activate it).

Did you also try "x2apic_phys" on the Xen command line (just to be sure
this isn't a clustered-mode only issue)?

> Appears error_interrupt() needs locking or some concurrency handling
> mechanism since the last error is jumbled.  With the setting "x2APIC"
> I get a bunch of:
> "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)"
> (apparently one for each core)
> Followed by "Receive accept error, Receive accept error," (again,
> apparently one for each core).  Then a bunch of newlines (same pattern).

This is a known issue, but since the messages shouldn't appear in the
first place so far no-one has bothered to address this.

> With the setting "auto" the last message is a series of
> "(XEN) CPU#: No irq handler for vector ## (IRQ -2147483648, LAPIC)" on
> 2 different cores.  Rather more of the lines were from one core, the
> vector value varied some.
> 
> Notable both sets of final error messages appeared after the Domain 0
> kernel thought it had been operating for >30 seconds.  Lack of
> response to interrupt generating events (pressing keys on USB keyboard)
> suggests interrupts weren't getting through.
> 
> 
> With "x2apic=false" error messages similar to the "Local APIC Mode"
> of "x2APIC" appear >45 seconds after Domain 0 kernel start.  Of note
> first "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)"
> appears for all cores with "Receive accept error,".
> 
> Yet later a variation on this message starts appearing:
> "(XEN) APIC error on CPU#: 08(08)(XEN) APIC error on CPU#: 08(08)"
> this one appears multiple times.

That's certainly odd, as it suggests that things were okay for a short
while.

> If one was to want full logs, the lack of secure communications channel
> would be an issue (since filtering out identifying data is difficult).
> DSA-3072 with SHA2-256 is now less than wonderful, but DSA-1024 and
> ElGamal 2048 are right out.

I think how good or bad my publicly available key is doesn't matter here
at all. You're not asked to provide the logs to me or any other
individual; you're asked to provide the logs to the community, such that
anyone interested may take a stab at addressing the issue. Eventual
patches may also want to refer to (parts of) such log

Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains

2023-03-17 Thread Jan Beulich
On 16.03.2023 23:59, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Jan Beulich wrote:
>> On 16.03.2023 11:26, Michal Orzel wrote:
>>> --- a/xen/drivers/char/console.c
>>> +++ b/xen/drivers/char/console.c
>>> @@ -490,7 +490,24 @@ static void switch_serial_input(void)
>>>  }
>>>  else
>>>  {
>>> -console_rx++;
>>> +unsigned int next_rx = console_rx + 1;
>>> +
>>> +/* Skip switching serial input to non existing domains */
>>> +while ( next_rx < max_init_domid + 1 )
>>> +{
>>> +struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
>>> +
>>> +if ( d )
>>> +{
>>> +rcu_unlock_domain(d);
>>> +break;
>>> +}
>>> +
>>> +next_rx++;
>>> +}
>>> +
>>> +console_rx = next_rx;
>>> +
>>>  printk("*** Serial input to DOM%d", console_rx - 1);
>>>  }
>>
>> While at the first glance (when you sent it in reply to v1) it looked okay,
>> I'm afraid it really isn't: Please consider what happens when the last of
>> the DomU-s doesn't exist anymore. (You don't really check whether it still
>> exists, because the range check comes ahead of the existence one.) In that
>> case you want to move from second-to-last to Xen. I expect the entire
>> if/else construct wants to be inside the loop.
> 
> I don't think we need another loop, just a check if we found a domain or

I didn't say "another loop", but I suggested that the loop needs to be
around the if/else. Of course this can be transformed into equivalent
forms, like ...

> not. E.g.:
> 
> 
> unsigned int next_rx = console_rx + 1;
> 
> /* Skip switching serial input to non existing domains */
> while ( next_rx < max_init_domid + 1 )
> {
> struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
> 
> if ( d )
> {
> rcu_unlock_domain(d);
> console_rx = next_rx;
> printk("*** Serial input to DOM%d", console_rx - 1);
> break;
> }
> 
> next_rx++;
> }
> 
> /* no domain found */
> console_rx = 0;
> printk("*** Serial input to Xen");

... what you suggest (or at least almost, because the way it's written
we'd always switch to Xen).

Jan



Re: [PATCH v3 4/6] vpci: restrict unhandled read/write operations for guests

2023-03-17 Thread Roger Pau Monné
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> A guest would be able to read and write those registers which are not
> emulated and have no respective vPCI handlers, so it will be possible
> for it to access the hardware directly.
> In order to prevent a guest from reads and writes from/to the unhandled
> registers make sure only hardware domain can access the hardware directly
> and restrict guests from doing so.
> 
> Suggested-by: Roger Pau Monné 
> Signed-off-by: Oleksandr Andrushchenko 
> 
> ---
> 
> v3:
>  - No changes
> 
> Older comments from another series:
> 
> Since v6:
> - do not use is_hwdom parameter for vpci_{read|write}_hw and use
>   current->domain internally
> - update commit message
> New in v6
> Moved into another series
> ---
>  xen/drivers/vpci/vpci.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 5232f9605b..199ff55672 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -220,6 +220,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned 
> int reg,
>  {
>  uint32_t data;
>  
> +/* Guest domains are not allowed to read real hardware. */
> +if ( !is_hardware_domain(current->domain) )
> +return ~(uint32_t)0;
> +
>  switch ( size )
>  {
>  case 4:
> @@ -260,9 +264,13 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned 
> int reg,
>  return data;
>  }
>  
> -static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int 
> size,
> -  uint32_t data)
> +static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg,
> +  unsigned int size, uint32_t data)

Unrelated change? The parameter list doesn't go over 80 characters so
this rearranging is not necessary, and in any case should be done in a
separate commit or at least mentioned in the commit log.

>  {
> +/* Guest domains are not allowed to write real hardware. */

I would maybe write this as:

"Unprivileged domain are not allowed unhandled accesses to the config
space."

But that's mostly a nit, and would also apply to the comment in
vpci_read_hw().

Thanks, Roger.



Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-03-17 Thread Jan Beulich
On 17.03.2023 00:19, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Jan Beulich wrote:
>> So yes, it then all boils down to that Linux-
>> internal question.
> 
> Excellent question but we'll have to wait for Ray as he is the one with
> access to the hardware. But I have this data I can share in the
> meantime:
> 
> [1.260378] IRQ to pin mappings:
> [1.260387] IRQ1 -> 0:1
> [1.260395] IRQ2 -> 0:2
> [1.260403] IRQ3 -> 0:3
> [1.260410] IRQ4 -> 0:4
> [1.260418] IRQ5 -> 0:5
> [1.260425] IRQ6 -> 0:6
> [1.260432] IRQ7 -> 0:7
> [1.260440] IRQ8 -> 0:8
> [1.260447] IRQ9 -> 0:9
> [1.260455] IRQ10 -> 0:10
> [1.260462] IRQ11 -> 0:11
> [1.260470] IRQ12 -> 0:12
> [1.260478] IRQ13 -> 0:13
> [1.260485] IRQ14 -> 0:14
> [1.260493] IRQ15 -> 0:15
> [1.260505] IRQ106 -> 1:8
> [1.260513] IRQ112 -> 1:4
> [1.260521] IRQ116 -> 1:13
> [1.260529] IRQ117 -> 1:14
> [1.260537] IRQ118 -> 1:15
> [1.260544]  done.

And what does Linux think are IRQs 16 ... 105? Have you compared with
Linux running baremetal on the same hardware?

Jan

> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> 112 (which is the one causing issues):
> 
> __acpi_register_gsi->
> acpi_register_gsi_ioapic->
> mp_map_gsi_to_irq->
> mp_map_pin_to_irq->
> __irq_resolve_mapping()
> 
> if (likely(data)) {
> desc = irq_data_to_desc(data);
> if (irq)
> *irq = data->irq;
> /* this IRQ is 112, IO-APIC-34 domain */
> }




Re: [PATCH v3 5/6] vpci: use reference counter to protect vpci state

2023-03-17 Thread Roger Pau Monné
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote:
> vPCI MMIO handlers are accessing pdevs without protecting this
> access with pcidevs_{lock|unlock}. This is not a problem as of now
> as these are only used by Dom0. But, towards vPCI is used also for
> guests, we need to properly protect pdev and pdev->vpci from being
> removed while still in use.
> 
> For that use pdev reference counting.

I wonder whether vPCI does need to take another reference to the
device.  This all stems from me not having it fully clear how the
reference counting is supposed to be used for pdevs.

As mentioned in a previous patch, I would expect device assignation to
take a reference, and hence vPCI won't need to take an extra refcount
since vPCI can only be used once the device has been assigned to a
domain, and hence already has at least an extra reference taken from
the fact it's assigned to a domain.

If anything I would add an ASSERT(pdev->refcount > 1) or equivalent.

> 
> Signed-off-by: Volodymyr Babchuk 
> Suggested-by: Jan Beulich 
> 
> ---
> 
> v3:
>  - Moved from another patch series
> ---
>  xen/drivers/vpci/vpci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 199ff55672..005f38dc77 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -62,6 +62,7 @@ void vpci_remove_device(struct pci_dev *pdev)
>  xfree(pdev->vpci->msi);
>  xfree(pdev->vpci);
>  pdev->vpci = NULL;
> +pcidev_put(pdev);
>  }
>  
>  int vpci_add_handlers(struct pci_dev *pdev)
> @@ -72,6 +73,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  if ( !has_vpci(pdev->domain) )
>  return 0;
>  
> +pcidev_get(pdev);
> +
>  /* We should not get here twice for the same device. */
>  ASSERT(!pdev->vpci);

You are missing a pcidev_put() in case allocation of pdev->vpci fails.

Thanks, Roger.



Re: [PATCH 4/4] Update README to state Python3 requirement

2023-03-17 Thread Jan Beulich
On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote:
> Python2 is not supported anymore.

There are two things here which concern me: For one, how come this is
at the end of a series? You want to keep in mind that any series may
be committed piecemeal (unless an indication to the contrary is in
the cover letter, but there's none here in the first place).

The other aspect is that there's no indication here of it being
consensus that we raise the baseline requirement for Python, and for
Python alone. A decision towards the wider topic of raising baseline
requirements is, as you may recall from the meeting in Cambridge,
still pending.

Jan



Re: [PATCH v3 6/6] xen: pci: print reference counter when dumping pci_devs

2023-03-17 Thread Roger Pau Monné
On Tue, Mar 14, 2023 at 08:56:30PM +, Volodymyr Babchuk wrote:
> This can be handy during new reference counter approach evaluation.
> 
> Signed-off-by: Volodymyr Babchuk 
> 
> ---
> 
> v3:
>  - Moved from another patch series
> ---
>  xen/drivers/passthrough/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index b32382aca0..1eb79e7d01 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1275,7 +1275,8 @@ static int cf_check _dump_pci_devices(struct pci_seg 
> *pseg, void *arg)
>  else
>  #endif
>  printk("%pd", pdev->domain);
> -printk(" - node %-3d", (pdev->node != NUMA_NO_NODE) ? pdev->node : 
> -1);
> +printk(" - node %-3d refcnt %d", (pdev->node != NUMA_NO_NODE) ? 
> pdev->node : -1,

This line is now too long (> 80 chars), you need to add a newline
between the format and the argument list.

The rest LGTM.

Thanks, Roger.



Re: [PATCH v8 1/5] xen: introduce CONFIG_GENERIC_BUG_FRAME

2023-03-17 Thread Oleksii
On Thu, 2023-03-16 at 12:26 +0100, Jan Beulich wrote:
> On 15.03.2023 18:21, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/common/bug.c
> > @@ -0,0 +1,108 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> 
> I actually meant to also ask: What is this needed for? Glancing over
> the
> code ...
> 
> > +/*
> > + * Returns a negative value in case of an error otherwise
> > + * BUGFRAME_{run_fn, warn, bug, assert}
> > + */
> > +int do_bug_frame(struct cpu_user_regs *regs, unsigned long pc)
> > +{
> > +    const struct bug_frame *bug = NULL;
> > +    const struct virtual_region *region;
> > +    const char *prefix = "", *filename, *predicate;
> > +    unsigned long fixup;
> > +    unsigned int id, lineno;
> > +
> > +    region = find_text_region(pc);
> > +    if ( !region )
> > +    return -EINVAL;
> > +
> > +    for ( id = 0; id < BUGFRAME_NR; id++ )
> > +    {
> > +    const struct bug_frame *b;
> > +    size_t i;
> > +
> > +    for ( i = 0, b = region->frame[id].bugs;
> > +  i < region->frame[id].n_bugs; b++, i++ )
> > +    {
> > +    if ( bug_loc(b) == pc )
> > +    {
> > +    bug = b;
> > +    goto found;
> > +    }
> > +    }
> > +    }
> > +
> > + found:
> > +    if ( !bug )
> > +    return -ENOENT;
> > +
> > +    if ( id == BUGFRAME_run_fn )
> > +    {
> > +    void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
> > +
> > +    fn(regs);
> > +
> > +    /* Re-enforce consistent types, because of the casts
> > involved. */
> > +    if ( false )
> > +    run_in_exception_handler(fn);
> > +
> > +    return id;
> > +    }
> > +
> > +    /* WARN, BUG or ASSERT: decode the filename pointer and line
> > number. */
> > +    filename = bug_ptr(bug);
> > +    if ( !is_kernel(filename) && !is_patch(filename) )
> > +    return -EINVAL;
> > +    fixup = strlen(filename);
> > +    if ( fixup > 50 )
> > +    {
> > +    filename += fixup - 47;
> > +    prefix = "...";
> > +    }
> > +    lineno = bug_line(bug);
> > +
> > +    switch ( id )
> > +    {
> > +    case BUGFRAME_warn:
> > +    printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
> > +    show_execution_state(regs);
> > +
> > +    break;
> > +
> > +    case BUGFRAME_bug:
> > +    printk("Xen BUG at %s%s:%d\n", prefix, filename, lineno);
> > +
> > +    if ( BUG_DEBUGGER_TRAP_FATAL(regs) )
> > +    break;
> > +
> > +    show_execution_state(regs);
> > +    panic("Xen BUG at %s%s:%d\n", prefix, filename, lineno);
> > +
> > +    case BUGFRAME_assert:
> > +    /* ASSERT: decode the predicate string pointer. */
> > +    predicate = bug_msg(bug);
> > +    if ( !is_kernel(predicate) && !is_patch(predicate) )
> > +    predicate = "";
> > +
> > +    printk("Assertion '%s' failed at %s%s:%d\n",
> > +   predicate, prefix, filename, lineno);
> > +
> > +    if ( BUG_DEBUGGER_TRAP_FATAL(regs) )
> > +    break;
> > +
> > +    show_execution_state(regs);
> > +    panic("Assertion '%s' failed at %s%s:%d\n",
> > +  predicate, prefix, filename, lineno);
> > +    }
> > +
> > +    return id;
> > +}
> 
> ... I can't really spot what it might be that comes from that header.
> Oh, on the N+1st run I've spotted it - it's show_execution_state().
> The declaration of which, already being used from common code ahead
> of this series, should imo be moved to a common header. I guess I'll
> make yet another patch ...
As mentioned above. Not only show_execution_state() but also
cpu_user_regs structure. ( at lest, for ARM & RISCV )

~ Oleksii


Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains

2023-03-17 Thread Michal Orzel



On 17/03/2023 09:36, Jan Beulich wrote:
> 
> 
> On 16.03.2023 23:59, Stefano Stabellini wrote:
>> On Thu, 16 Mar 2023, Jan Beulich wrote:
>>> On 16.03.2023 11:26, Michal Orzel wrote:
 --- a/xen/drivers/char/console.c
 +++ b/xen/drivers/char/console.c
 @@ -490,7 +490,24 @@ static void switch_serial_input(void)
  }
  else
  {
 -console_rx++;
 +unsigned int next_rx = console_rx + 1;
 +
 +/* Skip switching serial input to non existing domains */
 +while ( next_rx < max_init_domid + 1 )
 +{
 +struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
 +
 +if ( d )
 +{
 +rcu_unlock_domain(d);
 +break;
 +}
 +
 +next_rx++;
 +}
 +
 +console_rx = next_rx;
 +
  printk("*** Serial input to DOM%d", console_rx - 1);
  }
>>>
>>> While at the first glance (when you sent it in reply to v1) it looked okay,
>>> I'm afraid it really isn't: Please consider what happens when the last of
>>> the DomU-s doesn't exist anymore. (You don't really check whether it still
>>> exists, because the range check comes ahead of the existence one.) In that
>>> case you want to move from second-to-last to Xen. I expect the entire
>>> if/else construct wants to be inside the loop.
>>
>> I don't think we need another loop, just a check if we found a domain or
> 
> I didn't say "another loop", but I suggested that the loop needs to be
> around the if/else. Of course this can be transformed into equivalent
> forms, like ...
> 
>> not. E.g.:
>>
>>
>> unsigned int next_rx = console_rx + 1;
>>
>> /* Skip switching serial input to non existing domains */
>> while ( next_rx < max_init_domid + 1 )
>> {
>> struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
>>
>> if ( d )
>> {
>> rcu_unlock_domain(d);
>> console_rx = next_rx;
>> printk("*** Serial input to DOM%d", console_rx - 1);
>> break;
>> }
>>
>> next_rx++;
>> }
>>
>> /* no domain found */
>> console_rx = 0;
>> printk("*** Serial input to Xen");
> 
> ... what you suggest (or at least almost, because the way it's written
> we'd always switch to Xen).

I would prefer a loop with if/else inside. If you are ok with the following code
that handles all the cases, I will push a patch in a minute:

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 51e5408f2114..96ec3bbcf541 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -483,15 +483,34 @@ struct domain *console_input_domain(void)
 
 static void switch_serial_input(void)
 {
-if ( console_rx == max_init_domid + 1 )
-{
-console_rx = 0;
-printk("*** Serial input to Xen");
-}
-else
+unsigned int next_rx = console_rx + 1;
+
+/*
+ * Rotate among Xen, dom0 and boot-time created domUs while skipping
+ * switching serial input to non existing domains.
+ */
+while ( next_rx <= max_init_domid + 2 )
 {
-console_rx++;
-printk("*** Serial input to DOM%d", console_rx - 1);
+if ( next_rx == max_init_domid + 2 )
+{
+console_rx = 0;
+printk("*** Serial input to Xen");
+break;
+}
+else
+{
+struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
+
+if ( d )
+{
+rcu_unlock_domain(d);
+console_rx = next_rx;
+printk("*** Serial input to DOM%d", console_rx - 1);
+break;
+}
+
+next_rx++;
+}
 }
 
 if ( switch_code )

~Michal



Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-03-17 Thread Roger Pau Monné
On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> On 17.03.2023 00:19, Stefano Stabellini wrote:
> > On Thu, 16 Mar 2023, Jan Beulich wrote:
> >> So yes, it then all boils down to that Linux-
> >> internal question.
> > 
> > Excellent question but we'll have to wait for Ray as he is the one with
> > access to the hardware. But I have this data I can share in the
> > meantime:
> > 
> > [1.260378] IRQ to pin mappings:
> > [1.260387] IRQ1 -> 0:1
> > [1.260395] IRQ2 -> 0:2
> > [1.260403] IRQ3 -> 0:3
> > [1.260410] IRQ4 -> 0:4
> > [1.260418] IRQ5 -> 0:5
> > [1.260425] IRQ6 -> 0:6
> > [1.260432] IRQ7 -> 0:7
> > [1.260440] IRQ8 -> 0:8
> > [1.260447] IRQ9 -> 0:9
> > [1.260455] IRQ10 -> 0:10
> > [1.260462] IRQ11 -> 0:11
> > [1.260470] IRQ12 -> 0:12
> > [1.260478] IRQ13 -> 0:13
> > [1.260485] IRQ14 -> 0:14
> > [1.260493] IRQ15 -> 0:15
> > [1.260505] IRQ106 -> 1:8
> > [1.260513] IRQ112 -> 1:4
> > [1.260521] IRQ116 -> 1:13
> > [1.260529] IRQ117 -> 1:14
> > [1.260537] IRQ118 -> 1:15
> > [1.260544]  done.
> 
> And what does Linux think are IRQs 16 ... 105? Have you compared with
> Linux running baremetal on the same hardware?

So I have some emails from Ray from he time he was looking into this,
and on Linux dom0 PVH dmesg there is:

[0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 0-23
[0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55

So it seems the vIO-APIC data provided by Xen to dom0 is at least
consistent.
 
> > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > 112 (which is the one causing issues):
> > 
> > __acpi_register_gsi->
> > acpi_register_gsi_ioapic->
> > mp_map_gsi_to_irq->
> > mp_map_pin_to_irq->
> > __irq_resolve_mapping()
> > 
> > if (likely(data)) {
> > desc = irq_data_to_desc(data);
> > if (irq)
> > *irq = data->irq;
> > /* this IRQ is 112, IO-APIC-34 domain */
> > }


Could this all be a result of patch 4/5 in the Linux series ("[RFC
PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
__acpi_register_gsi hook is installed for PVH in order to setup GSIs
using PHYSDEV ops instead of doing it natively from the IO-APIC?

FWIW, the introduced function in that patch
(acpi_register_gsi_xen_pvh()) seems to unconditionally call
acpi_register_gsi_ioapic() without checking if the GSI is already
registered, which might lead to multiple IRQs being allocated for the
same underlying GSI?

As I commented there, I think that approach is wrong.  If the GSI has
not been mapped in Xen (because dom0 hasn't unmasked the respective
IO-APIC pin) we should add some logic in the toolstack to map it
before attempting to bind.

Thanks, Roger.



Re: [PATCH v3 1/6] xen: add reference counter support

2023-03-17 Thread Roger Pau Monné
On Thu, Mar 16, 2023 at 05:56:00PM +0100, Jan Beulich wrote:
> On 16.03.2023 17:48, Roger Pau Monné wrote:
> > On Thu, Mar 16, 2023 at 05:43:18PM +0100, Jan Beulich wrote:
> >> On 16.03.2023 17:39, Roger Pau Monné wrote:
> >>> On Thu, Mar 16, 2023 at 05:32:38PM +0100, Jan Beulich wrote:
>  On 16.03.2023 17:19, Roger Pau Monné wrote:
> > On Tue, Mar 14, 2023 at 08:56:29PM +, Volodymyr Babchuk wrote:
> >> +static inline void refcnt_get(refcnt_t *refcnt)
> >> +{
> >> +int old = atomic_add_unless(&refcnt->refcnt, 1, 0);
> >
> > Occurred to me while looking at the next patch:
> >
> > Don't you also need to print a warning (and saturate the counter
> > maybe?) if old == 0, as that would imply the caller is attempting
> > to take a reference of an object that should be destroyed? IOW: it
> > would point to some kind of memory leak.
> 
>  Hmm, I notice the function presently returns void. I think what to do
>  when the counter is zero needs leaving to the caller. See e.g.
>  get_page() which will simply indicate failure to the caller in case
>  the refcnt is zero. (There overflow handling also is left to the
>  caller ... All that matters is whether a ref can be acquired.)
> >>>
> >>> Hm, likely.  I guess pages never go away even when it's refcount
> >>> reaches 0.
> >>>
> >>> For the pdev case attempting to take a refcount on an object that has
> >>> 0 refcounts implies that the caller is using leaked memory, as the
> >>> point an object reaches 0 it supposed to be destroyed.
> >>
> >> Hmm, my thinking was that a device would remain at refcnt 0 until it is
> >> actually removed, i.e. refcnt == 0 being a prereq for pci_remove_device()
> >> to be willing to do anything at all. But maybe that's not a viable model.
> > 
> > Right, I think the intention was for pci_remove_device() to drop the
> > refcount to 0 and do the removal, so the refcount should be 1 when
> > calling pci_remove_device().  But none of this is written down, so
> > it's mostly my assumptions from looking at the code.
> 
> Could such work at all? The function can't safely drop a reference
> and _then_ check whether it was the last one. The function either has
> to take refcnt == 0 as prereq, or it needs to be the destructor
> function that refcnt_put() calls.

But then you also get in the trouble of asserting that refcnt == 0
doesn't change between evaluation and actual removal of the structure.

Should all refcounts to pdev be taken and dropped while holding the
pcidevs lock?

I there an email (outside of this series) that contains a description
of how the refcounting is to be used with pdevs?

Thanks, Roger.



Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring

2023-03-17 Thread Roger Pau Monné
On Fri, Mar 10, 2023 at 10:18:32AM +0100, Roger Pau Monné wrote:
> On Fri, Mar 10, 2023 at 10:01:39AM +1100, Michael Ellerman wrote:
> > Roger Pau Monné  writes:
> > > On Mon, Dec 12, 2022 at 01:36:48PM +0100, Roger Pau Monné wrote:
> > >> On Fri, Dec 02, 2022 at 12:40:05PM +0100, Roger Pau Monné wrote:
> > >> > On Wed, Nov 30, 2022 at 05:08:06PM -0800, Stefano Stabellini wrote:
> > >> > > On Wed, 30 Nov 2022, Roger Pau Monne wrote:
> > >> > > > The hvc machinery registers both a console and a tty device based 
> > >> > > > on
> > >> > > > the hv ops provided by the specific implementation.  Those two
> > >> > > > interfaces however have different locks, and there's no single 
> > >> > > > locks
> > >> > > > that's shared between the tty and the console implementations, 
> > >> > > > hence
> > >> > > > the driver needs to protect itself against concurrent accesses.
> > >> > > > Otherwise concurrent calls using the split interfaces are likely to
> > >> > > > corrupt the ring indexes, leaving the console unusable.
> > >> > > >
> > >> > > > Introduce a lock to xencons_info to serialize accesses to the 
> > >> > > > shared
> > >> > > > ring.  This is only required when using the shared memory console,
> > >> > > > concurrent accesses to the hypercall based console implementation 
> > >> > > > are
> > >> > > > not an issue.
> > >> > > >
> > >> > > > Note the conditional logic in domU_read_console() is slightly 
> > >> > > > modified
> > >> > > > so the notify_daemon() call can be done outside of the locked 
> > >> > > > region:
> > >> > > > it's an hypercall and there's no need for it to be done with the 
> > >> > > > lock
> > >> > > > held.
> > >> > >
> > >> > > For domU_read_console: I don't mean to block this patch but we need 
> > >> > > to
> > >> > > be sure about the semantics of hv_ops.get_chars. Either it is 
> > >> > > expected
> > >> > > to be already locked, then we definitely shouldn't add another lock 
> > >> > > to
> > >> > > domU_read_console. Or it is not expected to be already locked, then 
> > >> > > we
> > >> > > should add the lock.
> > >> > >
> > >> > > My impression is that it is expected to be already locked, but I 
> > >> > > think
> > >> > > we need Greg or Jiri to confirm one way or the other.
> > >> >
> > >> > Let me move both to the 'To:' field then.
> > >> >
> > >> > My main concern is the usage of hv_ops.get_chars hook in
> > >> > hvc_poll_get_char(), as it's not obvious to me that callers of
> > >> > tty->poll_get_char hook as returned by tty_find_polling_driver() will
> > >> > always do so with the tty lock held (in fact the only user right now
> > >> > doesn't seem to hold the tty lock).
> > >> >
> > >> > > Aside from that the rest looks fine.
> > >
> > > I guess I could reluctantly remove the lock in the get_chars hook,
> > > albeit I'm not convinced at all the lock is not needed there.
> > 
> > I don't know the xen driver, but other HVC backends have a lock around
> > their private state in their get_chars() implementations.
> > 
> > See eg. hvterm_raw_get_chars().
> 
> Yes, that was one of the motivation for adding the lock also here, and
> it has already been mentioned.  The other is the usage of the hooks by
> callers of hvc_poll_get_char().

Ping?

Thanks, Roger.



Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

2023-03-17 Thread Roger Pau Monné
On Thu, Mar 16, 2023 at 04:09:44PM -0700, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross  wrote:
> > > > 
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich  wrote:
> > > > > > 
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > > 
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how 
> > > > > > > > > > can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > > 
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, 
> > > > > > > > > there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > > 
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > > 
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > > 
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > > 
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it 
> > > > > > > does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), 
> > > > > > > we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > > 
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > > 
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > > 
> > > > But the swiotlb isn't per device, but system global.
> > > 
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> > 
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA 
> > mask
> > won't use the swiotlb unless you have a buffer above guest physical address 
> > of
> > 16TB (so basically never).
> > 
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> > 
> > I don't think this is acceptable.
> 
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.

But this is already the case o

[PATCH] Fix PCI hotplug AML

2023-03-17 Thread David Woodhouse
From: David Woodhouse 

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+ShiftRight (0x4, \_GPE.PH00, Local1)
+Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Signed-off-by: David Woodhouse 
Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
---
Utterly untested in Xen. Tested the same change in a different
environment which is using precisely the *same* AML for guest
compatibility.

diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 1176da80ef..1d27809116 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -431,7 +431,7 @@ int main(int argc, char **argv)
 stmt("Store", "0x89, \\_GPE.DPT2");
 }
 if ( slot & 1 )
-stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
+stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1);
 else
 stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
 stmt("Return", "Local1"); /* IN status as the _STA */



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2 2/2] tools/misc: Add xen-vcpus-stats tool

2023-03-17 Thread Matias Ezequiel Vara Larsen
On Thu, Feb 23, 2023 at 08:31:29PM +, Julien Grall wrote:
> Hi,
> 
> On 23/02/2023 16:01, Andrew Cooper wrote:
> > On 07/10/2022 1:39 pm, Matias Ezequiel Vara Larsen wrote:
> > 
> > A couple of observations, all unrelated to the stats themselves.
> > 
> > Although overall, I'm not entirely certain that a tool like this is
> > going to be very helpful after initial development.  Something to
> > consider would be to alter libxenstat to use this new interface?
> > 
> > > diff --git a/tools/misc/Makefile b/tools/misc/Makefile
> > > index 2b683819d4..837e4b50da 100644
> > > --- a/tools/misc/Makefile
> > > +++ b/tools/misc/Makefile
> > > @@ -49,6 +49,7 @@ TARGETS_COPY += xenpvnetboot
> > > 
> > > # Everything which needs to be built
> > > TARGETS_BUILD := $(filter-out $(TARGETS_COPY),$(TARGETS_ALL))
> > > +TARGETS_BUILD += xen-vcpus-stats
> > 
> > This patch is whitespace corrupted.  If at all possible, you need to see
> > about getting `git send-email` working to send patches with, as it deals
> > with most of the whitespace problems for you.
> > 
> > I'm afraid you can't simply copy the patch text into an email and send that.
> > 
> > > 
> > > # ... including build-only targets
> > > TARGETS_BUILD-$(CONFIG_X86)    += xen-vmtrace
> > > @@ -135,4 +136,9 @@ xencov: xencov.o
> > > xen-ucode: xen-ucode.o
> > >  $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
> > > 
> > > +xen-vcpus-stats.o: CFLAGS += $(CFLAGS_libxenforeginmemory)
> > > +
> > > +xen-vcpus-stats: xen-vcpus-stats.o
> > > +    $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl)
> > > $(LDLIBS_libxenforeignmemory) $(APPEND_LDFLAGS)
> > > +
> > > -include $(DEPS_INCLUDE)
> > > diff --git a/tools/misc/xen-vcpus-stats.c b/tools/misc/xen-vcpus-stats.c
> > > new file mode 100644
> > > index 00..29d0efb124
> > > --- /dev/null
> > > +++ b/tools/misc/xen-vcpus-stats.c
> > > @@ -0,0 +1,87 @@
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#define rmb()   asm volatile("lfence":::"memory")
> > 
> > This is rmb(), but rmb() isn't what you want.
> > 
> > You want smp_rmb(), which is
> > 
> > #define smp_rmb() asm volatile ("" ::: "memory")
> 
> From the generic PoV, I find smp_rmb() a bit misleading because it is not
> clear in this context whether we are referring to the SMP-ness of the
> hypervisor or the tools domain.
> 
> If the latter, then technically it could be uniprocessor domain and one
> could argue that for Arm it could be downgraded to just a compiler barrier.
> 
> AFAICT, this would not be the case here because we are getting data from
> Xen. So we always need a "dmb ish".
> 
> So, I would suggest to name it virt_*() (to match Linux's naming).
> 
> Also, is this tool meant to be arch-agnostic? If so, then we need to
> introduce the proper barrier for the other arch.
> 
Thanks Julien for the comment. Is it `xen_rmb()` meant for that?

Matias



[xen-unstable test] 179697: tolerable trouble: fail/pass/starved - PUSHED

2023-03-17 Thread osstest service owner
flight 179697 xen-unstable real [real]
flight 179714 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/179697/
http://logs.test-lab.xenproject.org/osstest/logs/179714/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail pass in 
179714-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179668
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179668
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179668
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179668
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179668
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179668
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179668
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179668
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179668
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-armhf-armhf-examine  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  36e49fc8cbad21a4308b4701caaa685ef04e120b
baseline version:
 xen  b2ea81d2b935474cf27a76b4aa143ae897e82457

Last test of basis   179668  2023-03-16 06:30:55 Z1 days
Testing same since   179697  2023-03-16 20:45:20 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Henry Wang 
  Jan Beulich 
  Jason Andryuk 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-

Re: [XEN PATCH v2] x86/monitor: Add new monitor event to catch I/O instructions

2023-03-17 Thread Andrew Cooper
On 15/03/2023 6:54 pm, Dmitry Isaykin wrote:
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index a43bcf2e92..49225f48a7 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2939,17 +2939,26 @@ void svm_vmexit_handler(void)
>  break;
>  
>  case VMEXIT_IOIO:
> -if ( (vmcb->exitinfo1 & (1u<<2)) == 0 )
> + {
> +uint16_t port = (vmcb->exitinfo1 >> 16) & 0x;
> +int bytes = ((vmcb->exitinfo1 >> 4) & 0x07);
> +int dir = (vmcb->exitinfo1 & 1) ? IOREQ_READ : IOREQ_WRITE;
> +bool string_ins = (vmcb->exitinfo1 & (1u<<2));
> +int rc = hvm_monitor_io(port, bytes, dir, string_ins);
> +if ( rc < 0 )
> +goto unexpected_exit_type;
> +if ( !rc )
>  {
> -uint16_t port = (vmcb->exitinfo1 >> 16) & 0x;
> -int bytes = ((vmcb->exitinfo1 >> 4) & 0x07);
> -int dir = (vmcb->exitinfo1 & 1) ? IOREQ_READ : IOREQ_WRITE;
> -if ( handle_pio(port, bytes, dir) )
> -__update_guest_eip(regs, vmcb->exitinfo2 - vmcb->rip);
> +if ( !string_ins )
> +{
> +if ( handle_pio(port, bytes, dir) )
> +__update_guest_eip(regs, vmcb->exitinfo2 - vmcb->rip);
> +}
> +else if ( !hvm_emulate_one_insn(x86_insn_is_portio, "port I/O") )
> +hvm_inject_hw_exception(TRAP_gp_fault, 0);
>  }
> -else if ( !hvm_emulate_one_insn(x86_insn_is_portio, "port I/O") )
> -hvm_inject_hw_exception(TRAP_gp_fault, 0);
>  break;
> +}

There are a few style issues, but it's also a mess because of the manual
exitinfo decoding, so I went ahead and did

https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=df9369154aa010b2322e3f3e0727a242784cfd4f

to clean it up.  The rebased version of this hunk is now:

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index bfe03316def6..17ac99f6cd56 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2939,6 +2939,15 @@ void svm_vmexit_handler(void)
 break;
 
 case VMEXIT_IOIO:
+    rc = hvm_monitor_io(vmcb->ei.io.port,
+    vmcb->ei.io.bytes,
+    vmcb->ei.io.in ? IOREQ_READ : IOREQ_WRITE,
+    vmcb->ei.io.str);
+    if ( rc < 0 )
+    goto unexpected_exit_type;
+    if ( rc )
+    break;
+
 if ( !vmcb->ei.io.str )
 {
 if ( handle_pio(vmcb->ei.io.port,

which I hope you'll agree is much more simple to follow.

I'm also trying to sort out a similar cleanup on the Intel side, but
haven't managed to post that yet.

~Andrew



[PATCH 0/2] tools/xl: small cleanup of parsing code

2023-03-17 Thread Juergen Gross
2 small patches cleaning up the parsing code in xl a little bit.

Juergen Gross (2):
  tools/xl: allow split_string_into_pair() to trim values
  tools/xl: rework p9 config parsing

 tools/xl/xl_parse.c | 114 +---
 tools/xl/xl_parse.h |   4 +-
 2 files changed, 56 insertions(+), 62 deletions(-)

-- 
2.35.3




[PATCH 1/2] tools/xl: allow split_string_into_pair() to trim values

2023-03-17 Thread Juergen Gross
Most use cases of split_string_into_pair() are requiring the returned
strings to be white space trimmed.

In order to avoid the same code pattern multiple times, add a predicate
parameter to split_string_into_pair() which can be specified to call
trim() with that predicate for the string pair returned. Specifying
NULL for the predicate will avoid the call of trim().

Signed-off-by: Juergen Gross 
---
 tools/xl/xl_parse.c | 42 +++---
 tools/xl/xl_parse.h |  4 ++--
 2 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 853e9f357a..2f9dfea05c 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -646,7 +646,7 @@ static void parse_vnuma_config(const XLU_Config *config,
  conf_count++) {
 
 if (xlu_cfg_value_type(conf_option) == XLU_STRING) {
-char *buf, *option_untrimmed, *value_untrimmed;
+char *buf;
 char *option, *value;
 unsigned long val;
 
@@ -654,15 +654,12 @@ static void parse_vnuma_config(const XLU_Config *config,
 
 if (!buf) continue;
 
-if (split_string_into_pair(buf, "=",
-   &option_untrimmed,
-   &value_untrimmed)) {
+if (split_string_into_pair(buf, "=", &option, &value,
+   isspace)) {
 fprintf(stderr, "xl: failed to split \"%s\" into pair\n",
 buf);
 exit(EXIT_FAILURE);
 }
-trim(isspace, option_untrimmed, &option);
-trim(isspace, value_untrimmed, &value);
 
 if (!strcmp("pnode", option)) {
 val = parse_ulong(value);
@@ -715,8 +712,6 @@ static void parse_vnuma_config(const XLU_Config *config,
 }
 free(option);
 free(value);
-free(option_untrimmed);
-free(value_untrimmed);
 }
 }
 }
@@ -838,7 +833,7 @@ int parse_vdispl_config(libxl_device_vdispl *vdispl, char 
*token)
 
 rc = split_string_into_pair(connectors[i], ":",
 &vdispl->connectors[i].unique_id,
-&resolution);
+&resolution, NULL);
 
 rc= sscanf(resolution, "%ux%u", &vdispl->connectors[i].width,
&vdispl->connectors[i].height);
@@ -2292,18 +2287,15 @@ void parse_config_data(const char *config_source,
 split_string_into_string_list(buf, ",", &pairs);
 len = libxl_string_list_length(&pairs);
 for (i = 0; i < len; i++) {
-char *key, *key_untrimmed, *value, *value_untrimmed;
+char *key, *value;
 int rc;
-rc = split_string_into_pair(pairs[i], "=",
-&key_untrimmed,
-&value_untrimmed);
+rc = split_string_into_pair(pairs[i], "=", &key, &value,
+isspace);
 if (rc != 0) {
 fprintf(stderr, "failed to parse channel configuration: 
%s",
 pairs[i]);
 exit(1);
 }
-trim(isspace, key_untrimmed, &key);
-trim(isspace, value_untrimmed, &value);
 
 if (!strcmp(key, "backend")) {
 replace_string(&chn->backend_domname, value);
@@ -2326,9 +2318,7 @@ void parse_config_data(const char *config_source,
   " ignoring\n", key);
 }
 free(key);
-free(key_untrimmed);
 free(value);
-free(value_untrimmed);
 }
 switch (chn->connection) {
 case LIBXL_CHANNEL_CONNECTION_UNKNOWN:
@@ -2952,10 +2942,8 @@ void trim(char_predicate_t predicate, const char *input, 
char **output)
 *output = result;
 }
 
-int split_string_into_pair(const char *str,
-   const char *delim,
-   char **a,
-   char **b)
+int split_string_into_pair(const char *str, const char *delim,
+   char **a, char **b, char_predicate_t predicate)
 {
 char *s, *p, *saveptr, *aa = NULL, *bb = NULL;
 int rc = 0;
@@ -2967,13 +2955,21 @@ int split_string_into_pair(const char *str,
 rc = ERROR_INVAL;
 goto out;
 }
-aa = xstrdup(p);
+if (predicate) {
+trim(predicate, p, &aa);
+} else {
+aa = xstrdup(p);
+}
 p = strtok_r(NULL, delim, &saveptr);
 if (p == NULL) {
 rc = ERROR_INVAL;
 goto out;
 }
-bb = xstrdup(p);
+if

[PATCH 2/2] tools/xl: rework p9 config parsing

2023-03-17 Thread Juergen Gross
Rework the config parsing of a p9 device to use the
split_string_into_pair() function instead of open coding it.

Signed-off-by: Juergen Gross 
---
 tools/xl/xl_parse.c | 72 ++---
 1 file changed, 35 insertions(+), 37 deletions(-)

diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 2f9dfea05c..715e14f95f 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2111,54 +2111,52 @@ void parse_config_data(const char *config_source,
 
 if (!xlu_cfg_get_list(config, "p9", &p9devs, 0, 0)) {
 libxl_device_p9 *p9;
-char *security_model = NULL;
-char *path = NULL;
-char *tag = NULL;
-char *backend = NULL;
-char *p, *p2, *buf2;
 
 d_config->num_p9s = 0;
 d_config->p9s = NULL;
 while ((buf = xlu_cfg_get_listitem (p9devs, d_config->num_p9s)) != 
NULL) {
+libxl_string_list pairs;
+int len;
+
 p9 = ARRAY_EXTEND_INIT(d_config->p9s,
d_config->num_p9s,
libxl_device_p9_init);
 libxl_device_p9_init(p9);
 
-buf2 = strdup(buf);
-p = strtok(buf2, ",");
-if(p) {
-   do {
-  while(*p == ' ')
- ++p;
-  if ((p2 = strchr(p, '=')) == NULL)
- break;
-  *p2 = '\0';
-  if (!strcmp(p, "security_model")) {
- security_model = strdup(p2 + 1);
-  } else if(!strcmp(p, "path")) {
- path = strdup(p2 + 1);
-  } else if(!strcmp(p, "tag")) {
- tag = strdup(p2 + 1);
-  } else if(!strcmp(p, "backend")) {
- backend = strdup(p2 + 1);
-  } else {
- fprintf(stderr, "Unknown string `%s' in 9pfs spec\n", p);
- exit(1);
-  }
-   } while ((p = strtok(NULL, ",")) != NULL);
-}
-if (!path || !security_model || !tag) {
-   fprintf(stderr, "9pfs spec missing required field!\n");
-   exit(1);
+split_string_into_string_list(buf, ",", &pairs);
+len = libxl_string_list_length(&pairs);
+for (i = 0; i < len; i++) {
+char *key, *value;
+int rc;
+
+rc = split_string_into_pair(pairs[i], "=", &key, &value,
+isspace);
+if (rc != 0) {
+fprintf(stderr, "failed to parse 9pfs configuration: %s",
+pairs[i]);
+exit(1);
+}
+
+if (!strcmp(key, "security_model")) {
+replace_string(&p9->security_model, value);
+} else if (!strcmp(key, "path")) {
+replace_string(&p9->path, value);
+} else if (!strcmp(key, "tag")) {
+replace_string(&p9->tag, value);
+} else if (!strcmp(key, "backend")) {
+replace_string(&p9->backend_domname, value);
+} else {
+fprintf(stderr, "Unknown 9pfs parameter '%s'\n", key);
+exit(1);
+}
+free(key);
+free(value);
 }
-free(buf2);
 
-replace_string(&p9->tag, tag);
-replace_string(&p9->security_model, security_model);
-replace_string(&p9->path, path);
-if (backend)
-replace_string(&p9->backend_domname, backend);
+if (!p9->path || !p9->security_model || !p9->tag) {
+fprintf(stderr, "9pfs spec missing required field!\n");
+exit(1);
+}
 }
 }
 
-- 
2.35.3




Re: [PATCH v2] x86: use POPCNT for hweight() when available

2023-03-17 Thread Roger Pau Monné
On Mon, Jul 15, 2019 at 02:39:04PM +, Jan Beulich wrote:
> This is faster than using the software implementation, and the insn is
> available on all half-way recent hardware. Therefore convert
> generic_hweight() to out-of-line functions (without affecting Arm)
> and use alternatives patching to replace the function calls.
> 
> Note that the approach doesn#t work for clang, due to it not recognizing
> -ffixed-*.

I've been giving this a look, and I wonder if it would be fine to
simply push and pop the scratch registers in the 'call' path of the
alternative, as that won't require any specific compiler option.

Thanks, Roger.



Re: [PATCH] x86: extend coverage of HLE "bad page" workaround

2023-03-17 Thread Roger Pau Monné
On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
> On 26.05.2020 17:01, Andrew Cooper wrote:
> > On 26/05/2020 14:35, Jan Beulich wrote:
> >> On 26.05.2020 13:17, Andrew Cooper wrote:
> >>> On 26/05/2020 07:49, Jan Beulich wrote:
>  Respective Core Gen10 processor lines are affected, too.
> 
>  Signed-off-by: Jan Beulich 
> 
>  --- a/xen/arch/x86/mm.c
>  +++ b/xen/arch/x86/mm.c
>  @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>   case 0x000506e0: /* errata SKL167 / SKW159 */
>   case 0x000806e0: /* erratum KBL??? */
>   case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>  +case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>  +case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> >>> This is marred in complexity.
> >>>
> >>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
> >>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
> >>> no longer exists/works.  This applies to IceLake systems, but possibly
> >>> not their initial release configuration (hence, via a later microcode
> >>> update).
> >>>
> >>> HLE is also disabled in microcode on all older parts for errata reasons,
> >>> so in practice it doesn't exist anywhere now.
> >>>
> >>> I think it is safe to drop this workaround, and this does seem a more
> >>> simple option than encoding which microcode turned HLE off (which sadly
> >>> isn't covered by the spec updates, as even when turned off, HLE is still
> >>> functioning according to its spec of "may speed things up, may do
> >>> nothing"), or the interactions with the CPUID hiding capabilities of
> >>> MSR_TSX_CTRL.
> >> I'm afraid I don't fully follow: For one, does what you say imply HLE is
> >> no longer enumerated in CPUID?
> > 
> > No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> > microcode fiasco", the HLE bit will continue to exist and be set. 
> > (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
> > 
> > It was always a weird CPUID bit.  You were supposed to put
> > XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> > on old hardware and go faster on newer hardware.
> > 
> > There is nothing runtime code needs to look at the HLE bit for, except
> > perhaps for UI reporting purposes.
> 
> Do you know of some public Intel doc I could reference for all of this,
> which I would kind of need in the description of a patch ...
> 
> >> But then this
> >> erratum does not have the usual text effectively meaning that an ucode
> >> update is or will be available to address the issue; instead it says
> >> that BIOS or VMM can reserve the respective address range.
> > 
> > This is not surprising at all.  Turning off HLE was an unrelated
> > activity, and I bet the link went unnoticed.
> > 
> >> This - assuming the alternative you describe is indeed viable - then is 
> >> surely
> >> a much more intrusive workaround than needed. Which I wouldn't assume
> >> they would suggest in such a case.
> > 
> > My suggestion was to drop the workaround, not to complicated it with a
> > microcode revision matrix.
> 
> ... doing this? I don't think I've seen any of this in writing so far,
> except by you. (I don't understand how this reply of yours relates to
> what I was saying about the spec update. I understand what you are
> suggesting. I merely tried to express that I'd have expected Intel to
> point out the much easier workaround, rather than just a pretty involved
> one.) Otherwise, may I suggest you make such a patch, to make sure it
> has an adequate description?

Seeing as there seems to be some data missing to justify the commit -
was has Linux done with those erratas?

Thanks, Roger.



[XEN PATCH v3] x86/monitor: Add new monitor event to catch I/O instructions

2023-03-17 Thread Dmitry Isaykin
Adds monitor support for I/O instructions.

Signed-off-by: Dmitry Isaykin 
Signed-off-by: Anton Belousov 
Acked-by: Tamas K Lengyel 
---
Changes in v3:
 * Rebase on staging
 * Refactor branch logic on monitor_traps response

Changes in v2:
 * Handled INS and OUTS instructions too
 * Added I/O monitoring support for AMD
 * Rename functions and structures (remove "_instruction" part)
 * Reorder parameters of hvm_monitor_io to match handle_pio's order
 * Change type of string_ins parameter to bool
 * Change vm_event_io structure
 * Handle monitor_traps's return status
---
 tools/include/xenctrl.h|  1 +
 tools/libs/ctrl/xc_monitor.c   | 13 +
 xen/arch/x86/hvm/monitor.c | 21 +
 xen/arch/x86/hvm/svm/svm.c |  8 
 xen/arch/x86/hvm/vmx/vmx.c | 21 ++---
 xen/arch/x86/include/asm/domain.h  |  1 +
 xen/arch/x86/include/asm/hvm/monitor.h |  3 +++
 xen/arch/x86/include/asm/monitor.h |  3 ++-
 xen/arch/x86/monitor.c | 13 +
 xen/include/public/domctl.h|  1 +
 xen/include/public/vm_event.h  | 10 ++
 11 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 23037874d3..05967ecc92 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2102,6 +2102,7 @@ int xc_monitor_emul_unimplemented(xc_interface *xch, 
uint32_t domain_id,
   bool enable);
 int xc_monitor_vmexit(xc_interface *xch, uint32_t domain_id, bool enable,
   bool sync);
+int xc_monitor_io(xc_interface *xch, uint32_t domain_id, bool enable);
 /**
  * This function enables / disables emulation for each REP for a
  * REP-compatible instruction.
diff --git a/tools/libs/ctrl/xc_monitor.c b/tools/libs/ctrl/xc_monitor.c
index c5fa62ff30..3cb96f444f 100644
--- a/tools/libs/ctrl/xc_monitor.c
+++ b/tools/libs/ctrl/xc_monitor.c
@@ -261,6 +261,19 @@ int xc_monitor_vmexit(xc_interface *xch, uint32_t 
domain_id, bool enable,
 return do_domctl(xch, &domctl);
 }
 
+int xc_monitor_io(xc_interface *xch, uint32_t domain_id, bool enable)
+{
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_monitor_op;
+domctl.domain = domain_id;
+domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+: XEN_DOMCTL_MONITOR_OP_DISABLE;
+domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_IO;
+
+return do_domctl(xch, &domctl);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index a11cd76f4d..ff958b6c05 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -346,6 +346,27 @@ int hvm_monitor_vmexit(unsigned long exit_reason,
 return monitor_traps(curr, ad->monitor.vmexit_sync, &req);
 }
 
+int hvm_monitor_io(uint16_t port, unsigned int bytes,
+   int dir, bool string_ins)
+{
+struct vcpu *curr = current;
+struct arch_domain *ad = &curr->domain->arch;
+vm_event_request_t req = {};
+
+if ( !ad->monitor.io_enabled )
+return 0;
+
+req.reason = VM_EVENT_REASON_IO_INSTRUCTION;
+req.u.io.data_size = bytes;
+req.u.io.port = port;
+req.u.io.dir = dir;
+req.u.io.string_ins = string_ins;
+
+set_npt_base(curr, &req);
+
+return monitor_traps(curr, true, &req);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index bfe03316de..b55f825f5d 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2939,6 +2939,14 @@ void svm_vmexit_handler(void)
 break;
 
 case VMEXIT_IOIO:
+rc = hvm_monitor_io(vmcb->ei.io.port,
+vmcb->ei.io.bytes,
+vmcb->ei.io.in ? IOREQ_READ : IOREQ_WRITE,
+vmcb->ei.io.str);
+if ( rc < 0 )
+goto unexpected_exit_type;
+if ( rc )
+break;
 if ( !vmcb->ei.io.str )
 {
 if ( handle_pio(vmcb->ei.io.port,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 00b531f76c..8a278f16c5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4560,7 +4560,24 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 break;
 
 case EXIT_REASON_IO_INSTRUCTION:
+{
+uint16_t port;
+int bytes, dir;
+bool string_ins;
+int rc;
+
 __vmread(EXIT_QUALIFICATION, &exit_qualification);
+
+port = (exit_qualification >> 16) & 0x;
+bytes = (exit_qualification & 0x07) + 1;
+dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE;
+string_ins = (exit_qualification & 0x10);
+rc = hvm_monitor_io(port, bytes, dir, string_ins);
+if ( rc < 0 )
+goto exit_and_crash;
+if ( rc )
+break;
+
   

Re: [PATCH 4/4] Update README to state Python3 requirement

2023-03-17 Thread Marek Marczykowski-Górecki
On Fri, Mar 17, 2023 at 09:46:33AM +0100, Jan Beulich wrote:
> On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote:
> > Python2 is not supported anymore.
> 
> There are two things here which concern me: For one, how come this is
> at the end of a series? You want to keep in mind that any series may
> be committed piecemeal (unless an indication to the contrary is in
> the cover letter, but there's none here in the first place).
> 
> The other aspect is that there's no indication here of it being
> consensus that we raise the baseline requirement for Python, and for
> Python alone. A decision towards the wider topic of raising baseline
> requirements is, as you may recall from the meeting in Cambridge,
> still pending.

Hmm, in fact the only part of this series that isn't python2 compatible
anymore is "install-python-bindings" target in tools/libs/stat/Makefile.
And it's enabled only with XENSTAT_PYTHON_BINDING=y is explicitly set.
So, maybe this readme change isn't relevant at all, at least not yet.
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


[ovmf test] 179713: all pass - PUSHED

2023-03-17 Thread osstest service owner
flight 179713 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179713/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 410ca0ff94a42ee541dd6ceab70ea974eeb7e500
baseline version:
 ovmf 0e5717009779ec6c1e35f979426a2cd407b3e73a

Last test of basis   179705  2023-03-17 04:10:46 Z0 days
Testing same since   179713  2023-03-17 08:15:15 Z0 days1 attempts


People who touched revisions under test:
  Gang Chen 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   0e57170097..410ca0ff94  410ca0ff94a42ee541dd6ceab70ea974eeb7e500 -> 
xen-tested-master



Re: [PATCH v2] x86: use POPCNT for hweight() when available

2023-03-17 Thread Andrew Cooper
On 17/03/2023 11:22 am, Roger Pau Monné wrote:
> On Mon, Jul 15, 2019 at 02:39:04PM +, Jan Beulich wrote:
>> This is faster than using the software implementation, and the insn is
>> available on all half-way recent hardware. Therefore convert
>> generic_hweight() to out-of-line functions (without affecting Arm)
>> and use alternatives patching to replace the function calls.
>>
>> Note that the approach doesn#t work for clang, due to it not recognizing
>> -ffixed-*.
> I've been giving this a look, and I wonder if it would be fine to
> simply push and pop the scratch registers in the 'call' path of the
> alternative, as that won't require any specific compiler option.

It's been a long while, and in that time I've learnt a lot more about
performance, but my root objection to the approach taken here still
stands - it is penalising the common case to optimise some pointless
corner cases.

Yes - on the call path, an extra push/pop pair (or few) to get temp
registers is basically free.


Right now, the generic_hweight() helpers are static inline in
xen/bitops.h and this is nonsense.  The absolute best they should be is
extern inline in our new lib/ and I'll bet that the compilers stop
inlining them there and then.

Given new abi's like x86_64-v2 (which guarantees POPCNT as an available
feature), it would be nice to arrange to use __builtin_popcnt() to let
the compiler optimise to its hearts content, but outside of this case,
it is actively damaging to try and optimise for memory operands or to
inline the 8/16 case.

So, for x86 specifically, we want:

if ( CONFIG_POPCNT )
    __builtin_popcnt()
else
    ALT( "popcnt D -> a",
         "call arch_popcnt$N" )

and we can write arch_popcnt* in x86's lib/ and in assembly, because
these are trivial enough and we do need to be careful with registers/etc.

I'm not sure if a "+D" vs "D" will matter at the top level.  Probably
not, so it might be an easy way to get one tempt register.  Other temp
registers can come from push/pop.


While we're at it, we should split hweight out of bitops and write the
common header in such a way that it defaults to the generic
implementations in lib/, and that will subsume the ARM header and also
make this work on RISC-V for free.

~Andrew



Re: [PATCH 4/4] Update README to state Python3 requirement

2023-03-17 Thread George Dunlap
On Fri, Mar 17, 2023 at 8:46 AM Jan Beulich  wrote:

> On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote:
> > Python2 is not supported anymore.
>
> There are two things here which concern me: For one, how come this is
> at the end of a series? You want to keep in mind that any series may
> be committed piecemeal (unless an indication to the contrary is in
> the cover letter, but there's none here in the first place).
>
> The other aspect is that there's no indication here of it being
> consensus that we raise the baseline requirement for Python, and for
> Python alone. A decision towards the wider topic of raising baseline
> requirements is, as you may recall from the meeting in Cambridge,
> still pending.
>

To me, the idea behind that discussion was that if we agree on a policy --
or at least general principles -- then we can avoid having to have
discussions on a case-by-case basis.  The fact that the discussion is still
open isn't a reason not to deprecate features; it just means that we still
need to discuss each one on its merits.

Given that Python 2 was announced unsupported years ago now, it seems
obvious to me that we should stop trying to support it.

Are you arguing that we *should* continue to support Python2?  Do you think
anyone will?

If so, please either make that case, or recommend that we raise the profile
of this so that others can.

 -George


Re: [PATCH v4] acpi/processor: fix evaluating _PDC method when running as Xen dom0

2023-03-17 Thread Juergen Gross

On 16.03.23 17:42, Roger Pau Monne wrote:

In ACPI systems, the OS can direct power management, as opposed to the
firmware.  This OS-directed Power Management is called OSPM.  Part of
telling the firmware that the OS going to direct power management is
making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
methods must be evaluated for every processor object.  If these _PDC
calls are not completed for every processor it can lead to
inconsistency and later failures in things like the CPU frequency
driver.

In a Xen system, the dom0 kernel is responsible for system-wide power
management.  The dom0 kernel is in charge of OSPM.  However, the
number of CPUs available to dom0 can be different than the number of
CPUs physically present on the system.

This leads to a problem: the dom0 kernel needs to evaluate _PDC for
all the processors, but it can't always see them.

In dom0 kernels, ignore the existing ACPI method for determining if a
processor is physically present because it might not be accurate.
Instead, ask the hypervisor for this information.

Fix this by introducing a custom function to use when running as Xen
dom0 in order to check whether a processor object matches a CPU that's
online.  Such checking is done using the existing information fetched
by the Xen pCPU subsystem, extending it to also store the ACPI ID.

This ensures that _PDC method gets evaluated for all physically online
CPUs, regardless of the number of CPUs made available to dom0.

Fixes: 5d554a7bb064 ('ACPI: processor: add internal 
processor_physically_present()')
Signed-off-by: Roger Pau Monné 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[xen-unstable-smoke test] 179716: tolerable trouble: pass/starved - PUSHED

2023-03-17 Thread osstest service owner
flight 179716 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179716/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  9c0061825143716c61622966e76983886ef59361
baseline version:
 xen  36e49fc8cbad21a4308b4701caaa685ef04e120b

Last test of basis   179687  2023-03-16 15:00:53 Z0 days
Testing same since   179716  2023-03-17 11:01:55 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Marek Marczykowski-Górecki 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  starved 
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  starved 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   36e49fc8cb..9c00618251  9c0061825143716c61622966e76983886ef59361 -> smoke



[PATCH v3 02/10] xen/arm: add SVE vector length field to the domain

2023-03-17 Thread Luca Fancellu
Add sve_vl_bits field to arch_domain and sve_vl field to
xen_arch_domainconfig struct, to allow the domain to have an
information about the SVE feature and the number of SVE register
bits that are allowed for this domain.

sve_vl field is the vector length in bits divided by 128, this
allows to use less space in the xen_arch_domainconfig struct.

The field is used also to allow or forbid a domain to use SVE,
because a value equal to zero means the guest is not allowed to
use the feature.

Check that the requested vector length is lower or equal to the
platform supported vector length, otherwise fail on domain
creation.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - rename field in xen_arch_domainconfig from "sve_vl_bits" to
   "sve_vl" and use the implicit padding after gic_version to
   store it, now this field is the VL/128. (Jan)
 - Created domainconfig_decode_vl() function to decode the sve_vl
   field and use it as plain bits value inside arch_domain.
 - Changed commit message reflecting the changes
Changes from v1:
 - no changes
Changes from RFC:
 - restore zcr_el2 in sve_restore_state, that will be introduced
   later in this serie, so remove zcr_el2 related code from this
   patch and move everything to the later patch (Julien)
 - add explicit padding into struct xen_arch_domainconfig (Julien)
 - Don't lower down the vector length, just fail to create the
   domain. (Julien)
---
 xen/arch/arm/arm64/sve.c |  9 
 xen/arch/arm/domain.c| 32 
 xen/arch/arm/include/asm/arm64/sve.h | 18 
 xen/arch/arm/include/asm/domain.h|  5 +
 xen/include/public/arch-arm.h|  2 ++
 xen/include/public/domctl.h  |  2 +-
 6 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c
index c466de61b47f..4a4ff5dbef49 100644
--- a/xen/arch/arm/arm64/sve.c
+++ b/xen/arch/arm/arm64/sve.c
@@ -6,6 +6,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -48,3 +49,11 @@ register_t vl_to_zcr(uint16_t vl)
 ASSERT(vl > 0);
 return ((vl / SVE_VL_MULTIPLE_VAL) - 1U) & ZCR_ELx_LEN_MASK;
 }
+
+/* Get the system sanitized value for VL in bits */
+uint16_t get_sys_vl_len(void)
+{
+/* ZCR_ELx len field is ((len+1) * 128) = vector bits length */
+return ((system_cpuinfo.zcr64.bits[0] & ZCR_ELx_LEN_MASK) + 1U) *
+SVE_VL_MULTIPLE_VAL;
+}
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index adb6ace2e24d..470e8607fabe 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -13,6 +13,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -550,6 +551,8 @@ int arch_vcpu_create(struct vcpu *v)
 v->arch.vmpidr = MPIDR_SMP | vcpuid_to_vaffinity(v->vcpu_id);
 
 v->arch.cptr_el2 = get_default_cptr_flags();
+if ( is_sve_domain(v->domain) )
+v->arch.cptr_el2 &= ~HCPTR_CP(8);
 
 v->arch.hcr_el2 = get_default_hcr_flags();
 
@@ -594,6 +597,7 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 unsigned int max_vcpus;
 unsigned int flags_required = (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap);
 unsigned int flags_optional = (XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_vpmu);
+unsigned int sve_vl_bits = domainconfig_decode_vl(config->arch.sve_vl);
 
 if ( (config->flags & ~flags_optional) != flags_required )
 {
@@ -602,6 +606,31 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return -EINVAL;
 }
 
+/* Check feature flags */
+if ( sve_vl_bits > 0 ) {
+unsigned int zcr_max_bits = get_sys_vl_len();
+
+if ( !cpu_has_sve )
+{
+dprintk(XENLOG_INFO, "SVE is unsupported on this machine.\n");
+return -EINVAL;
+}
+else if ( !is_vl_valid(sve_vl_bits) )
+{
+dprintk(XENLOG_INFO, "Unsupported SVE vector length (%u)\n",
+sve_vl_bits);
+return -EINVAL;
+}
+else if ( sve_vl_bits > zcr_max_bits )
+{
+dprintk(XENLOG_INFO,
+"The requested SVE vector length (%u) must be lower or \n"
+"equal to the platform supported vector length (%u)\n",
+sve_vl_bits, zcr_max_bits);
+return -EINVAL;
+}
+}
+
 /* The P2M table must always be shared between the CPU and the IOMMU */
 if ( config->iommu_opts & XEN_DOMCTL_IOMMU_no_sharept )
 {
@@ -744,6 +773,9 @@ int arch_domain_create(struct domain *d,
 if ( (rc = domain_vpci_init(d)) != 0 )
 goto fail;
 
+/* Copy and decode sve_vl from the domain configuration */
+d->arch.sve_vl_bits = domainconfig_decode_vl(config->arch.sve_vl);
+
 return 0;
 
 fail:
diff --git a/xen/arch/arm/include/asm/arm64/sve.h 
b/xen/arch/arm/include/asm/arm64/sve.h
index bd56e2f24230..4b0b941c243b 100644
--- a/xen/arch/arm/include/asm/arm64/sve.h
+++ b/x

[PATCH v3 01/10] xen/arm: enable SVE extension for Xen

2023-03-17 Thread Luca Fancellu
Enable Xen to handle the SVE extension, add code in cpufeature module
to handle ZCR SVE register, disable trapping SVE feature on system
boot only when SVE resources are accessed.
While there, correct coding style for the comment on coprocessor
trapping.

Now cptr_el2 is part of the domain context and it will be restored
on context switch, this is a preparation for saving the SVE context
which will be part of VFP operations, so restore it before the call
to save VFP registers.
To save an additional isb barrier, restore cptr_el2 before an
existing isb barrier and move the call for saving VFP context after
that barrier.

Change the KConfig entry to make ARM64_SVE symbol selectable, by
default it will be not selected.

Create sve module and sve_asm.S that contains assembly routines for
the SVE feature, this code is inspired from linux and it uses
instruction encoding to be compatible with compilers that does not
support SVE.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - renamed sve_asm.S in sve-asm.S, new files should not contain
   underscore in the name (Jan)
Changes from v1:
 - Add assert to vl_to_zcr, it is never called with vl==0, but just
   to be sure it won't in the future.
Changes from RFC:
 - Moved restoring of cptr before an existing barrier (Julien)
 - Marked the feature as unsupported for now (Julien)
 - Trap and un-trap only when using SVE resources in
   compute_max_zcr() (Julien)
---
 xen/arch/arm/Kconfig | 10 +++--
 xen/arch/arm/arm64/Makefile  |  1 +
 xen/arch/arm/arm64/cpufeature.c  |  7 ++--
 xen/arch/arm/arm64/sve-asm.S | 48 +++
 xen/arch/arm/arm64/sve.c | 50 
 xen/arch/arm/cpufeature.c|  6 ++-
 xen/arch/arm/domain.c|  9 +++--
 xen/arch/arm/include/asm/arm64/sve.h | 43 
 xen/arch/arm/include/asm/arm64/sysregs.h |  1 +
 xen/arch/arm/include/asm/cpufeature.h| 14 +++
 xen/arch/arm/include/asm/domain.h|  1 +
 xen/arch/arm/include/asm/processor.h |  2 +
 xen/arch/arm/setup.c |  5 ++-
 xen/arch/arm/traps.c | 28 +++--
 14 files changed, 201 insertions(+), 24 deletions(-)
 create mode 100644 xen/arch/arm/arm64/sve-asm.S
 create mode 100644 xen/arch/arm/arm64/sve.c
 create mode 100644 xen/arch/arm/include/asm/arm64/sve.h

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c7f..41f45d8d1203 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -112,11 +112,15 @@ config ARM64_PTR_AUTH
  This feature is not supported in Xen.
 
 config ARM64_SVE
-   def_bool n
+   bool "Enable Scalar Vector Extension support (UNSUPPORTED)" if 
UNSUPPORTED
depends on ARM_64
help
- Scalar Vector Extension support.
- This feature is not supported in Xen.
+ Scalar Vector Extension (SVE/SVE2) support for guests.
+
+ Please be aware that currently, enabling this feature will add 
latency on
+ VM context switch between SVE enabled guests, between not-enabled SVE
+ guests and SVE enabled guests and viceversa, compared to the time
+ required to switch between not-enabled SVE guests.
 
 config ARM64_MTE
def_bool n
diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 6d507da0d44d..24e08fd42596 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -12,6 +12,7 @@ obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += smc.o
 obj-y += smpboot.o
+obj-$(CONFIG_ARM64_SVE) += sve.o sve-asm.o
 obj-y += traps.o
 obj-y += vfp.o
 obj-y += vsysreg.o
diff --git a/xen/arch/arm/arm64/cpufeature.c b/xen/arch/arm/arm64/cpufeature.c
index d9039d37b2d1..b4656ff4d80f 100644
--- a/xen/arch/arm/arm64/cpufeature.c
+++ b/xen/arch/arm/arm64/cpufeature.c
@@ -455,15 +455,11 @@ static const struct arm64_ftr_bits ftr_id_dfr1[] = {
ARM64_FTR_END,
 };
 
-#if 0
-/* TODO: use this to sanitize SVE once we support it */
-
 static const struct arm64_ftr_bits ftr_zcr[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
ZCR_ELx_LEN_SHIFT, ZCR_ELx_LEN_SIZE, 0),/* LEN */
ARM64_FTR_END,
 };
-#endif
 
 /*
  * Common ftr bits for a 32bit register with all hidden, strict
@@ -603,6 +599,9 @@ void update_system_features(const struct cpuinfo_arm *new)
 
SANITIZE_ID_REG(zfr64, 0, aa64zfr0);
 
+   if ( cpu_has_sve )
+   SANITIZE_REG(zcr64, 0, zcr);
+
/*
 * Comment from Linux:
 * Userspace may perform DC ZVA instructions. Mismatched block sizes
diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S
new file mode 100644
index ..4d1549344733
--- /dev/null
+++ b/xen/arch/arm/arm64/sve-asm.S
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Arm SVE assembly routines
+ *
+ * Copyright (C) 2022 ARM Ltd.

[PATCH v3 03/10] xen/arm: Expose SVE feature to the guest

2023-03-17 Thread Luca Fancellu
When a guest is allowed to use SVE, expose the SVE features through
the identification registers.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - no changes
Changes from v1:
 - No changes
Changes from RFC:
 - No changes
---
 xen/arch/arm/arm64/vsysreg.c | 39 ++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index 758750983c11..10048bb4d221 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -18,6 +18,7 @@
 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -295,7 +296,28 @@ void do_sysreg(struct cpu_user_regs *regs,
 GENERATE_TID3_INFO(MVFR0_EL1, mvfr, 0)
 GENERATE_TID3_INFO(MVFR1_EL1, mvfr, 1)
 GENERATE_TID3_INFO(MVFR2_EL1, mvfr, 2)
-GENERATE_TID3_INFO(ID_AA64PFR0_EL1, pfr64, 0)
+
+case HSR_SYSREG_ID_AA64PFR0_EL1:
+{
+register_t guest_reg_value = guest_cpuinfo.pfr64.bits[0];
+
+if ( is_sve_domain(v->domain) )
+{
+/* 4 is the SVE field width in id_aa64pfr0_el1 */
+uint64_t mask = GENMASK(ID_AA64PFR0_SVE_SHIFT + 4 - 1,
+ID_AA64PFR0_SVE_SHIFT);
+/* sysval is the sve field on the system */
+uint64_t sysval = cpuid_feature_extract_unsigned_field_width(
+system_cpuinfo.pfr64.bits[0],
+ID_AA64PFR0_SVE_SHIFT, 4);
+guest_reg_value &= ~mask;
+guest_reg_value |= (sysval << ID_AA64PFR0_SVE_SHIFT) & mask;
+}
+
+return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 1,
+  guest_reg_value);
+}
+
 GENERATE_TID3_INFO(ID_AA64PFR1_EL1, pfr64, 1)
 GENERATE_TID3_INFO(ID_AA64DFR0_EL1, dbg64, 0)
 GENERATE_TID3_INFO(ID_AA64DFR1_EL1, dbg64, 1)
@@ -306,7 +328,20 @@ void do_sysreg(struct cpu_user_regs *regs,
 GENERATE_TID3_INFO(ID_AA64MMFR2_EL1, mm64, 2)
 GENERATE_TID3_INFO(ID_AA64AFR0_EL1, aux64, 0)
 GENERATE_TID3_INFO(ID_AA64AFR1_EL1, aux64, 1)
-GENERATE_TID3_INFO(ID_AA64ZFR0_EL1, zfr64, 0)
+
+case HSR_SYSREG_ID_AA64ZFR0_EL1:
+{
+/*
+ * When the guest has the SVE feature enabled, the whole 
id_aa64zfr0_el1
+ * needs to be exposed.
+ */
+register_t guest_reg_value = guest_cpuinfo.zfr64.bits[0];
+if ( is_sve_domain(v->domain) )
+guest_reg_value = system_cpuinfo.zfr64.bits[0];
+
+return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 1,
+  guest_reg_value);
+}
 
 /*
  * Those cases are catching all Reserved registers trapped by TID3 which
-- 
2.34.1




[PATCH v3 07/10] xen/physinfo: encode Arm SVE vector length in arch_capabilities

2023-03-17 Thread Luca Fancellu
When the arm platform supports SVE, advertise the feature in the
field arch_capabilities in struct xen_sysctl_physinfo by encoding
the SVE vector length in it.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - Remove XEN_SYSCTL_PHYSCAP_ARM_SVE_SHFT, use MASK_INSR and
   protect with ifdef XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK (Jan)
 - Use the helper function sve_arch_cap_physinfo to encode
   the VL into physinfo arch_capabilities field.
Changes from v1:
 - Use only arch_capabilities and some defines to encode SVE VL
   (Bertrand, Stefano, Jan)
Changes from RFC:
 - new patch
---
 xen/arch/arm/arm64/sve.c | 12 
 xen/arch/arm/include/asm/arm64/sve.h |  2 ++
 xen/arch/arm/sysctl.c|  3 +++
 xen/include/public/sysctl.h  |  4 
 4 files changed, 21 insertions(+)

diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c
index 6593b59b58e8..409b029e4d21 100644
--- a/xen/arch/arm/arm64/sve.c
+++ b/xen/arch/arm/arm64/sve.c
@@ -117,3 +117,15 @@ void sve_restore_state(struct vcpu *v)
 
 sve_load_ctx(sve_ctx_zreg_end, v->arch.vfp.fpregs, 1);
 }
+
+void sve_arch_cap_physinfo(uint32_t *arch_capabilities)
+{
+if ( cpu_has_sve )
+{
+/* Vector length is divided by 128 to save some space */
+uint32_t sve_vl = MASK_INSR(domainconfig_encode_vl(get_sys_vl_len()),
+XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK);
+
+*arch_capabilities |= sve_vl;
+}
+}
diff --git a/xen/arch/arm/include/asm/arm64/sve.h 
b/xen/arch/arm/include/asm/arm64/sve.h
index 8654b0fac4bc..34b4c39191cb 100644
--- a/xen/arch/arm/include/asm/arm64/sve.h
+++ b/xen/arch/arm/include/asm/arm64/sve.h
@@ -41,6 +41,7 @@ int sve_context_init(struct vcpu *v);
 void sve_context_free(struct vcpu *v);
 void sve_save_state(struct vcpu *v);
 void sve_restore_state(struct vcpu *v);
+void sve_arch_cap_physinfo(uint32_t *arch_capabilities);
 
 #else /* !CONFIG_ARM64_SVE */
 
@@ -69,6 +70,7 @@ static inline int sve_context_init(struct vcpu *v)
 static inline void sve_context_free(struct vcpu *v) {}
 static inline void sve_save_state(struct vcpu *v) {}
 static inline void sve_restore_state(struct vcpu *v) {}
+static inline void sve_arch_cap_physinfo(uint32_t *arch_capabilities) {}
 
 #endif
 
diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index b0a78a8b10d0..64e4d3e06a6b 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -11,11 +11,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 {
 pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap;
+
+sve_arch_cap_physinfo(&pi->arch_capabilities);
 }
 
 long arch_do_sysctl(struct xen_sysctl *sysctl,
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index e8dded9fb94a..99ea3fa0740b 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -94,6 +94,10 @@ struct xen_sysctl_tbuf_op {
 /* Max XEN_SYSCTL_PHYSCAP_* constant.  Used for ABI checking. */
 #define XEN_SYSCTL_PHYSCAP_MAX XEN_SYSCTL_PHYSCAP_gnttab_v2
 
+#ifdef __aarch64__
+#define XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK  (0x1FU)
+#endif
+
 struct xen_sysctl_physinfo {
 uint32_t threads_per_core;
 uint32_t cores_per_socket;
-- 
2.34.1




[PATCH v3 04/10] xen/arm: add SVE exception class handling

2023-03-17 Thread Luca Fancellu
SVE has a new exception class with code 0x19, introduce the new code
and handle the exception.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - No changes
Changes from v1:
 - No changes
Changes from RFC:
 - No changes
---
 xen/arch/arm/include/asm/processor.h |  1 +
 xen/arch/arm/traps.c | 12 
 2 files changed, 13 insertions(+)

diff --git a/xen/arch/arm/include/asm/processor.h 
b/xen/arch/arm/include/asm/processor.h
index 0e38926b94db..625c2bd0cd6c 100644
--- a/xen/arch/arm/include/asm/processor.h
+++ b/xen/arch/arm/include/asm/processor.h
@@ -426,6 +426,7 @@
 #define HSR_EC_HVC640x16
 #define HSR_EC_SMC640x17
 #define HSR_EC_SYSREG   0x18
+#define HSR_EC_SVE  0x19
 #endif
 #define HSR_EC_INSTR_ABORT_LOWER_EL 0x20
 #define HSR_EC_INSTR_ABORT_CURR_EL  0x21
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index a78a99ddadd0..c2e30feafd5a 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2160,6 +2160,13 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
 perfc_incr(trap_sysreg);
 do_sysreg(regs, hsr);
 break;
+case HSR_EC_SVE:
+GUEST_BUG_ON(regs_mode_is_32bit(regs));
+gprintk(XENLOG_WARNING,
+"Domain id %d tried to use SVE while not allowed\n",
+current->domain->domain_id);
+inject_undef_exception(regs, hsr);
+break;
 #endif
 
 case HSR_EC_INSTR_ABORT_LOWER_EL:
@@ -2189,6 +2196,11 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
 case HSR_EC_BRK:
 do_trap_brk(regs, hsr);
 break;
+case HSR_EC_SVE:
+/* An SVE exception is a bug somewhere in hypervisor code */
+printk("SVE trap at EL2.\n");
+do_unexpected_trap("Hypervisor", regs);
+break;
 #endif
 case HSR_EC_DATA_ABORT_CURR_EL:
 case HSR_EC_INSTR_ABORT_CURR_EL:
-- 
2.34.1




[PATCH v3 08/10] tools: add physinfo arch_capabilities handling for Arm

2023-03-17 Thread Luca Fancellu
On Arm, the SVE vector length is encoded in arch_capabilities field
of struct xen_sysctl_physinfo, make use of this field in the tools
when building for arm.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - rename arm_arch_capabilities.h in arm-arch-capabilities.h, use
   MASK_EXTR.
 - Now arm-arch-capabilities.h needs MASK_EXTR macro, but it is
   defined in libxl_internal.h, it doesn't feel right to include
   that header so move MASK_EXTR into xen-tools/libs.h that is also
   included in libxl_internal.h
Changes from v1:
 - now SVE VL is encoded in arch_capabilities on Arm
Changes from RFC:
 - new patch
---
 tools/golang/xenlight/helpers.gen.go  |  2 ++
 tools/golang/xenlight/types.gen.go|  1 +
 tools/include/arm-arch-capabilities.h | 33 +++
 tools/include/xen-tools/libs.h|  2 ++
 tools/libs/light/libxl.c  |  1 +
 tools/libs/light/libxl_internal.h |  1 -
 tools/libs/light/libxl_types.idl  |  1 +
 tools/ocaml/libs/xc/xenctrl.ml|  4 +---
 tools/ocaml/libs/xc/xenctrl.mli   |  4 +---
 tools/ocaml/libs/xc/xenctrl_stubs.c   |  8 ---
 tools/python/xen/lowlevel/xc/xc.c |  8 +--
 tools/xl/xl_info.c|  8 +++
 12 files changed, 61 insertions(+), 12 deletions(-)
 create mode 100644 tools/include/arm-arch-capabilities.h

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 0a203d22321f..35397be2f9e2 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -3506,6 +3506,7 @@ x.CapVmtrace = bool(xc.cap_vmtrace)
 x.CapVpmu = bool(xc.cap_vpmu)
 x.CapGnttabV1 = bool(xc.cap_gnttab_v1)
 x.CapGnttabV2 = bool(xc.cap_gnttab_v2)
+x.ArchCapabilities = uint32(xc.arch_capabilities)
 
  return nil}
 
@@ -3540,6 +3541,7 @@ xc.cap_vmtrace = C.bool(x.CapVmtrace)
 xc.cap_vpmu = C.bool(x.CapVpmu)
 xc.cap_gnttab_v1 = C.bool(x.CapGnttabV1)
 xc.cap_gnttab_v2 = C.bool(x.CapGnttabV2)
+xc.arch_capabilities = C.uint32_t(x.ArchCapabilities)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index a7c17699f80e..3d968a496744 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -1079,6 +1079,7 @@ CapVmtrace bool
 CapVpmu bool
 CapGnttabV1 bool
 CapGnttabV2 bool
+ArchCapabilities uint32
 }
 
 type Connectorinfo struct {
diff --git a/tools/include/arm-arch-capabilities.h 
b/tools/include/arm-arch-capabilities.h
new file mode 100644
index ..e07384652b14
--- /dev/null
+++ b/tools/include/arm-arch-capabilities.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef ARM_ARCH_CAPABILITIES_H
+#define ARM_ARCH_CAPABILITIES_H
+
+/* Tell the Xen public headers we are a user-space tools build. */
+#ifndef __XEN_TOOLS__
+#define __XEN_TOOLS__ 1
+#endif
+
+#include 
+#include 
+
+#include 
+
+static inline
+unsigned int arch_capabilities_arm_sve(unsigned int arch_capabilities)
+{
+#if defined(__aarch64__)
+unsigned int sve_vl = MASK_EXTR(arch_capabilities,
+XEN_SYSCTL_PHYSCAP_ARM_SVE_MASK);
+
+/* Vector length is divided by 128 before storing it in arch_capabilities 
*/
+return sve_vl * 128U;
+#else
+return 0;
+#endif
+}
+
+#endif /* ARM_ARCH_CAPABILITIES_H */
diff --git a/tools/include/xen-tools/libs.h b/tools/include/xen-tools/libs.h
index bafc90e2f603..8850dcef4b7f 100644
--- a/tools/include/xen-tools/libs.h
+++ b/tools/include/xen-tools/libs.h
@@ -63,6 +63,8 @@
 #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1))
 #endif
 
+#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m)))
+
 #ifndef __must_check
 #define __must_check __attribute__((__warn_unused_result__))
 #endif
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index a0bf7d186f69..175d6dde0b80 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -409,6 +409,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_gnttab_v1);
 physinfo->cap_gnttab_v2 =
 !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_gnttab_v2);
+physinfo->arch_capabilities = xcphysinfo.arch_capabilities;
 
 GC_FREE;
 return 0;
diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index ad982d691ab9..e8c1b2c2d3e7 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -132,7 +132,6 @@
 
 #define DIV_ROUNDUP(n, d) (((n) + (d) - 1) / (d))
 
-#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m)))
 #define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m))
 
 #define LIBXL__LOGGING_ENABLED
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index c10292e0d7e3..fd31dacf7d5a 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -1133,6 +1133,7 @@ libxl_physinfo = Struct("phy

[PATCH v3 00/10] SVE feature for arm guests

2023-03-17 Thread Luca Fancellu
This serie is introducing the possibility for Dom0 and DomU guests to use
sve/sve2 instructions.

SVE feature introduces new instruction and registers to improve performances on
floating point operations.

The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
when available the ID_AA64ZFR0_EL1 register provides additional information
about the implemented version and other SVE feature.

New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.

Z0-Z31 are scalable vector register whose size is implementation defined and
goes from 128 bits to maximum 2048, the term vector length will be used to refer
to this quantity.
P0-P15 are predicate registers and the size is the vector length divided by 8,
same size is the FFR (First Fault Register).
ZCR_ELx is a register that can control and restrict the maximum vector length
used by the  exception level and all the lower exception levels, so for
example EL3 can restrict the vector length usable by EL3,2,1,0.

The platform has a maximum implemented vector length, so for every value
written in ZCR register, if this value is above the implemented length, then the
lower value will be used. The RDVL instruction can be used to check what vector
length is the HW using after setting ZCR.

For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.

SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
register is added to the domain state, to be able to trap only the guests that
are not allowed to use SVE.

This serie is introducing a command line parameter to enable Dom0 to use SVE and
to set its maximum vector length that by default is 0 which means the guest is
not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
the selected value used as maximum allowed vector length (which could be lower
if the implemented one is lower).
For DomUs, an XL parameter with the same way of use is introduced and a dom0less
DTB binding is created.

The context switch is the most critical part because there can be big registers
to be saved, in this serie an easy approach is used and the context is
saved/restored every time for the guests that are allowed to use SVE.

Luca Fancellu (10):
  xen/arm: enable SVE extension for Xen
  xen/arm: add SVE vector length field to the domain
  xen/arm: Expose SVE feature to the guest
  xen/arm: add SVE exception class handling
  arm/sve: save/restore SVE context switch
  xen/arm: enable Dom0 to use SVE feature
  xen/physinfo: encode Arm SVE vector length in arch_capabilities
  tools: add physinfo arch_capabilities handling for Arm
  xen/tools: add sve parameter in XL configuration
  xen/arm: add sve property for dom0less domUs

 docs/man/xl.cfg.5.pod.in |  11 ++
 docs/misc/arm/device-tree/booting.txt|   9 ++
 docs/misc/xen-command-line.pandoc|  13 ++
 tools/golang/xenlight/helpers.gen.go |   4 +
 tools/golang/xenlight/types.gen.go   |   2 +
 tools/include/arm-arch-capabilities.h|  33 
 tools/include/libxl.h|   5 +
 tools/include/xen-tools/libs.h   |   2 +
 tools/libs/light/libxl.c |   1 +
 tools/libs/light/libxl_arm.c |   2 +
 tools/libs/light/libxl_internal.h|   1 -
 tools/libs/light/libxl_types.idl |   2 +
 tools/ocaml/libs/xc/xenctrl.ml   |   4 +-
 tools/ocaml/libs/xc/xenctrl.mli  |   4 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c  |   8 +-
 tools/python/xen/lowlevel/xc/xc.c|   8 +-
 tools/xl/xl_info.c   |   8 +
 tools/xl/xl_parse.c  |  26 +++-
 xen/arch/arm/Kconfig |  10 +-
 xen/arch/arm/arm64/Makefile  |   1 +
 xen/arch/arm/arm64/cpufeature.c  |   7 +-
 xen/arch/arm/arm64/sve-asm.S | 189 +++
 xen/arch/arm/arm64/sve.c | 131 
 xen/arch/arm/arm64/vfp.c |  79 ++
 xen/arch/arm/arm64/vsysreg.c |  39 -
 xen/arch/arm/cpufeature.c|   6 +-
 xen/arch/arm/domain.c|  48 +-
 xen/arch/arm/domain_build.c  |  11 ++
 xen/arch/arm/include/asm/arm64/sve.h |  85 ++
 xen/arch/arm/include/asm/arm64/sysregs.h |   4 +
 xen/arch/arm/include/asm/arm64/vfp.h |  10 ++
 xen/arch/arm/include/asm/cpufeature.h|  14 ++
 xen/arch/arm/include/asm/domain.h|   8 +
 xen/arch/arm/include/asm/processor.h |   3 +
 xen/arch/arm/setup.c |   5 +-
 xen/arch/arm/sysctl.c|   3 +
 xen/arch/arm/traps.c |  40 +++--
 xen/include/public/arch-arm.h|   2 +
 xen/include/public/domctl.h  |   2 +-
 xen/include/public/sysctl.h  |   4 +
 40 files changed, 769 insertions(+), 75 deletions(-)
 create mode 100644 tools/include/arm-arch-ca

[PATCH v3 06/10] xen/arm: enable Dom0 to use SVE feature

2023-03-17 Thread Luca Fancellu
Add a command line parameter to allow Dom0 the use of SVE resources,
the command line parameter dom0_sve controls the feature on this
domain and sets the maximum SVE vector length for Dom0.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - xen_domctl_createdomain field has changed into sve_vl and its
   value now is the VL / 128, create an helper function for that.
Changes from v1:
 - No changes
Changes from RFC:
 - Changed docs to explain that the domain won't be created if the
   requested vector length is above the supported one from the
   platform.
---
 docs/misc/xen-command-line.pandoc| 13 +
 xen/arch/arm/arm64/sve.c |  5 +
 xen/arch/arm/domain_build.c  |  4 
 xen/arch/arm/include/asm/arm64/sve.h |  9 +
 4 files changed, 31 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index e0b89b7d3319..595e0d17183c 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1005,6 +1005,19 @@ restrictions set up here. Note that the values to be 
specified here are
 ACPI PXM ones, not Xen internal node numbers. `relaxed` sets up vCPU
 affinities to prefer but be not limited to the specified node(s).
 
+### dom0_sve (arm)
+> `= `
+
+> Default: `0`
+
+Enable arm SVE usage for Dom0 domain and sets the maximum SVE vector length.
+Values above 0 means feature is enabled for Dom0, otherwise feature is 
disabled.
+Possible values are from 0 to maximum 2048, being multiple of 128, that will be
+the maximum vector length.
+Please note that the platform can supports a lower value, if the requested 
value
+is above the supported one, the domain creation will fail and the system will
+stop.
+
 ### dom0_vcpus_pin
 > `= `
 
diff --git a/xen/arch/arm/arm64/sve.c b/xen/arch/arm/arm64/sve.c
index 1b95a3cbadd1..6593b59b58e8 100644
--- a/xen/arch/arm/arm64/sve.c
+++ b/xen/arch/arm/arm64/sve.c
@@ -5,10 +5,15 @@
  * Copyright (C) 2022 ARM Ltd.
  */
 
+#include 
 #include 
 #include 
 #include 
 
+/* opt_dom0_sve: allow Dom0 to use SVE and set maximum vector length. */
+unsigned int __initdata opt_dom0_sve;
+integer_param("dom0_sve", opt_dom0_sve);
+
 extern unsigned int sve_get_hw_vl(void);
 extern void sve_save_ctx(uint64_t *sve_ctx, uint64_t *pregs, int save_ffr);
 extern void sve_load_ctx(uint64_t const *sve_ctx, uint64_t const *pregs,
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 9707eb7b1bb1..aa5ff52b90c2 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4084,6 +4085,9 @@ void __init create_dom0(void)
 if ( iommu_enabled )
 dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
+if ( opt_dom0_sve > 0 )
+dom0_cfg.arch.sve_vl = domainconfig_encode_vl(opt_dom0_sve);
+
 dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap);
 if ( IS_ERR(dom0) )
 panic("Error creating domain 0 (rc = %ld)\n", PTR_ERR(dom0));
diff --git a/xen/arch/arm/include/asm/arm64/sve.h 
b/xen/arch/arm/include/asm/arm64/sve.h
index ced010f42dad..8654b0fac4bc 100644
--- a/xen/arch/arm/include/asm/arm64/sve.h
+++ b/xen/arch/arm/include/asm/arm64/sve.h
@@ -25,8 +25,15 @@ static inline uint16_t domainconfig_decode_vl(uint8_t sve_vl)
 return sve_vl * SVE_VL_MULTIPLE_VAL;
 }
 
+static inline uint8_t domainconfig_encode_vl(uint16_t sve_vl_bits)
+{
+return sve_vl_bits / SVE_VL_MULTIPLE_VAL;
+}
+
 #ifdef CONFIG_ARM64_SVE
 
+extern unsigned int opt_dom0_sve;
+
 register_t compute_max_zcr(void);
 register_t vl_to_zcr(uint16_t vl);
 uint16_t get_sys_vl_len(void);
@@ -37,6 +44,8 @@ void sve_restore_state(struct vcpu *v);
 
 #else /* !CONFIG_ARM64_SVE */
 
+#define opt_dom0_sve (0)
+
 static inline register_t compute_max_zcr(void)
 {
 return 0;
-- 
2.34.1




[PATCH v3 10/10] xen/arm: add sve property for dom0less domUs

2023-03-17 Thread Luca Fancellu
Add a device tree property in the dom0less domU configuration
to enable the guest to use SVE.

Update documentation.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - xen_domctl_createdomain field name has changed into sve_vl
   and its value is the VL/128, use domainconfig_encode_vl
   to encode a plain VL in bits.
Changes from v1:
 - No changes
Changes from RFC:
 - Changed documentation
---
 docs/misc/arm/device-tree/booting.txt | 9 +
 xen/arch/arm/domain_build.c   | 7 +++
 2 files changed, 16 insertions(+)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 3879340b5e0a..d74bf9ab1c8b 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -193,6 +193,15 @@ with the following properties:
 Optional. Handle to a xen,cpupool device tree node that identifies the
 cpupool where the guest will be started at boot.
 
+- sve
+
+Optional. A number that, when above 0, enables SVE for this guest and sets
+its maximum SVE vector length. The default value is 0, that means this
+guest is not allowed to use SVE, the maximum value allowed is 2048, any
+other value must be multiple of 128.
+Please note that if the platform supports a lower value of bits, then the
+domain creation will fail.
+
 - xen,enhanced
 
 A string property. Possible property values are:
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index aa5ff52b90c2..81af231d1d87 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3964,6 +3964,13 @@ void __init create_domUs(void)
 d_cfg.max_maptrack_frames = val;
 }
 
+if ( dt_property_read_u32(node, "sve", &val) )
+{
+if ( val > UINT16_MAX )
+panic("sve property value (%"PRIu32") overflow\n", val);
+d_cfg.arch.sve_vl = domainconfig_encode_vl(val);
+}
+
 /*
  * The variable max_init_domid is initialized with zero, so here it's
  * very important to use the pre-increment operator to call
-- 
2.34.1




[PATCH v3 09/10] xen/tools: add sve parameter in XL configuration

2023-03-17 Thread Luca Fancellu
Add sve parameter in XL configuration to allow guests to use
SVE feature.

Signed-off-by: Luca Fancellu 
Acked-by: George Dunlap 
---
Changes from v2:
 - domain configuration field name has changed to sve_vl,
   also its value now is VL/128.
 - Add Ack-by George for the Golang bits
Changes from v1:
 - updated to use arch_capabilities field for vector length
Changes from RFC:
 - changed libxl_types.idl sve field to uint16
 - now toolstack uses info from physinfo to check against the
   sve XL value
 - Changed documentation
---
 docs/man/xl.cfg.5.pod.in | 11 +++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/include/libxl.h|  5 +
 tools/libs/light/libxl_arm.c |  2 ++
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  | 26 --
 7 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 10f37990be57..adf48fe8ac1d 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2952,6 +2952,17 @@ Currently, only the "sbsa_uart" model is supported for 
ARM.
 
 =back
 
+=item B
+
+To enable SVE, user must specify a number different from zero, maximum 2048 and
+multiple of 128. That value will be the maximum number of SVE registers bits
+that the hypervisor will impose to this guest. If the platform has a lower
+supported bits value, then the domain creation will fail.
+A value equal to zero is the default and it means this guest is not allowed to
+use SVE.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 35397be2f9e2..72a3a12a6065 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1149,6 +1149,7 @@ default:
 return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
+x.ArchArm.Sve = uint16(xc.arch_arm.sve)
 if err := x.ArchX86.MsrRelaxed.fromC(&xc.arch_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1653,6 +1654,7 @@ default:
 return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
+xc.arch_arm.sve = C.uint16_t(x.ArchArm.Sve)
 if err := x.ArchX86.MsrRelaxed.toC(&xc.arch_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 3d968a496744..3dc292b5f1be 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -564,6 +564,7 @@ TypeUnion DomainBuildInfoTypeUnion
 ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
+Sve uint16
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 5c65222f1ecb..c9719d2d785a 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -283,6 +283,11 @@
  */
 #define LIBXL_HAVE_BUILDINFO_ARCH_ARM_TEE 1
 
+/*
+ * libxl_domain_build_info has the arch_arm.sve field.
+ */
+#define LIBXL_HAVE_BUILDINFO_ARCH_ARM_SVE 1
+
 /*
  * LIBXL_HAVE_SOFT_RESET indicates that libxl supports performing
  * 'soft reset' for domains and there is 'soft_reset' shutdown reason
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index ddc7b2a15975..16a49031fd51 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -211,6 +211,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 return ERROR_FAIL;
 }
 
+config->arch.sve_vl = d_config->b_info.arch_arm.sve;
+
 return 0;
 }
 
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index fd31dacf7d5a..ef4a8358e54e 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -690,6 +690,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
+   ("sve", uint16),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
   ])),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index e344d4fda32e..3962ec6ec8cd 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -12,6 +12,7 @@
  * GNU Lesser General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1317,8 +1318,6 @@ void parse_config_data(const char *config_source,
 exit(EXIT_FAILURE);
 }
 
-libxl_physinfo_dispose(&physinfo);
-
 config= xlu_cfg_init(stderr, config_source);
 if (!config) {
 fprintf(stderr, "Failed to allocate for configu

[PATCH v3 05/10] arm/sve: save/restore SVE context switch

2023-03-17 Thread Luca Fancellu
Save/restore context switch for SVE, allocate memory to contain
the Z0-31 registers whose length is maximum 2048 bits each and
FFR who can be maximum 256 bits, the allocated memory depends on
how many bits is the vector length for the domain and how many bits
are supported by the platform.

Save P0-15 whose length is maximum 256 bits each, in this case the
memory used is from the fpregs field in struct vfp_state,
because V0-31 are part of Z0-31 and this space would have been
unused for SVE domain otherwise.

Create zcr_el{1,2} fields in arch_vcpu, initialise zcr_el2 on vcpu
creation given the requested vector length and restore it on
context switch, save/restore ZCR_EL1 value as well.

Remove headers from sve.c that are already included using
xen/sched.h.

Signed-off-by: Luca Fancellu 
---
Changes from v2:
 - No changes
Changes from v1:
 - No changes
Changes from RFC:
 - Moved zcr_el2 field introduction in this patch, restore its
   content inside sve_restore_state function. (Julien)
---
 xen/arch/arm/arm64/sve-asm.S | 141 +++
 xen/arch/arm/arm64/sve.c |  65 ++-
 xen/arch/arm/arm64/vfp.c |  79 +++--
 xen/arch/arm/domain.c|   7 ++
 xen/arch/arm/include/asm/arm64/sve.h |  13 +++
 xen/arch/arm/include/asm/arm64/sysregs.h |   3 +
 xen/arch/arm/include/asm/arm64/vfp.h |  10 ++
 xen/arch/arm/include/asm/domain.h|   2 +
 8 files changed, 281 insertions(+), 39 deletions(-)

diff --git a/xen/arch/arm/arm64/sve-asm.S b/xen/arch/arm/arm64/sve-asm.S
index 4d1549344733..8c37d7bc95d5 100644
--- a/xen/arch/arm/arm64/sve-asm.S
+++ b/xen/arch/arm/arm64/sve-asm.S
@@ -17,6 +17,18 @@
 .endif
 .endm
 
+.macro _sve_check_zreg znr
+.if (\znr) < 0 || (\znr) > 31
+.error "Bad Scalable Vector Extension vector register number \znr."
+.endif
+.endm
+
+.macro _sve_check_preg pnr
+.if (\pnr) < 0 || (\pnr) > 15
+.error "Bad Scalable Vector Extension predicate register number \pnr."
+.endif
+.endm
+
 .macro _check_num n, min, max
 .if (\n) < (\min) || (\n) > (\max)
 .error "Number \n out of range [\min,\max]"
@@ -26,6 +38,54 @@
 /* SVE instruction encodings for non-SVE-capable assemblers */
 /* (pre binutils 2.28, all kernel capable clang versions support SVE) */
 
+/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */
+.macro _sve_str_v nz, nxbase, offset=0
+_sve_check_zreg \nz
+_check_general_reg \nxbase
+_check_num (\offset), -0x100, 0xff
+.inst 0xe5804000\
+| (\nz) \
+| ((\nxbase) << 5)  \
+| (((\offset) & 7) << 10)   \
+| (((\offset) & 0x1f8) << 13)
+.endm
+
+/* LDR (vector): LDR Z\nz, [X\nxbase, #\offset, MUL VL] */
+.macro _sve_ldr_v nz, nxbase, offset=0
+_sve_check_zreg \nz
+_check_general_reg \nxbase
+_check_num (\offset), -0x100, 0xff
+.inst 0x85804000\
+| (\nz) \
+| ((\nxbase) << 5)  \
+| (((\offset) & 7) << 10)   \
+| (((\offset) & 0x1f8) << 13)
+.endm
+
+/* STR (predicate): STR P\np, [X\nxbase, #\offset, MUL VL] */
+.macro _sve_str_p np, nxbase, offset=0
+_sve_check_preg \np
+_check_general_reg \nxbase
+_check_num (\offset), -0x100, 0xff
+.inst 0xe580\
+| (\np) \
+| ((\nxbase) << 5)  \
+| (((\offset) & 7) << 10)   \
+| (((\offset) & 0x1f8) << 13)
+.endm
+
+/* LDR (predicate): LDR P\np, [X\nxbase, #\offset, MUL VL] */
+.macro _sve_ldr_p np, nxbase, offset=0
+_sve_check_preg \np
+_check_general_reg \nxbase
+_check_num (\offset), -0x100, 0xff
+.inst 0x8580\
+| (\np) \
+| ((\nxbase) << 5)  \
+| (((\offset) & 7) << 10)   \
+| (((\offset) & 0x1f8) << 13)
+.endm
+
 /* RDVL X\nx, #\imm */
 .macro _sve_rdvl nx, imm
 _check_general_reg \nx
@@ -35,11 +95,92 @@
 | (((\imm) & 0x3f) << 5)
 .endm
 
+/* RDFFR (unpredicated): RDFFR P\np.B */
+.macro _sve_rdffr np
+_sve_check_preg \np
+.inst 0x2519f000\
+| (\np)
+.endm
+
+/* WRFFR P\np.B */
+.macro _sve_wrffr np
+_sve_check_preg \np
+.inst 0x25289000\
+| ((\np) << 5)
+.endm
+
+.macro __for from:req, to:req
+.if (\from) == (\to)
+_for__body %\from
+.else
+__for %\from, %((\from) + ((\to) - (\from)) / 2)
+__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
+.endif
+.endm
+
+.macro _for var:req, from:req, to:req, insn:vararg
+.macro _for__body \var:req
+.noaltmacro
+\insn
+.altmacro
+.endm
+
+.altmacro
+__for \from, \to
+.noaltmacro
+
+.purgem _for__body
+.endm
+
+.macro sve_save nxzffrctx, nxpctx, save_ffr
+_for n, 0, 31, _sve_str_v \n, \nxzffrctx, \n - 32
+_for n, 0, 15, _sve_str_p \n, \nxpctx, 

Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring

2023-03-17 Thread Juergen Gross

On 30.11.22 16:09, Roger Pau Monne wrote:

The hvc machinery registers both a console and a tty device based on
the hv ops provided by the specific implementation.  Those two
interfaces however have different locks, and there's no single locks
that's shared between the tty and the console implementations, hence
the driver needs to protect itself against concurrent accesses.
Otherwise concurrent calls using the split interfaces are likely to
corrupt the ring indexes, leaving the console unusable.

Introduce a lock to xencons_info to serialize accesses to the shared
ring.  This is only required when using the shared memory console,
concurrent accesses to the hypercall based console implementation are
not an issue.

Note the conditional logic in domU_read_console() is slightly modified
so the notify_daemon() call can be done outside of the locked region:
it's an hypercall and there's no need for it to be done with the lock
held.

Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console')
Signed-off-by: Roger Pau Monné 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist

2023-03-17 Thread Juergen Gross

On 15.02.23 09:31, Jan Beulich wrote:

On 15.02.2023 01:07, Boris Ostrovsky wrote:


On 2/14/23 6:53 PM, Boris Ostrovsky wrote:


On 2/14/23 11:13 AM, Jan Beulich wrote:


--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -18,6 +18,8 @@
   #include 
   #include 
   +#include 
+
   #include 
   #include 
   #include 
@@ -32,6 +34,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
     #include "cpu.h"
@@ -934,7 +937,8 @@ do_cmd_auto:
   break;
     case RETBLEED_MITIGATION_IBPB:
-    setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+    if (!xen_pv_domain() || xen_vm_assist_ibpb(true))



Is this going to compile without CONFIG_XEN?


Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying
the compiler) and DCE will eliminate the call to the function due to
xen_pv_domain() being constant "false" in that case, avoiding any
linking issues. The interesting case here really is building with
XEN but without XEN_PV: That's why I needed to put the function in
enlighten.c. This wouldn't be needed if xen_pv_domain() was also
constant "false" in that case (just like xen_pvh_domain() is when
!XEN_PVH).


I also think these two conditions should be wrapped into something to limit 
exposure of non-Xen code to Xen-specific primitives.


I would have done so, if I had any halfway sensible idea on how to
go about doing so in this particular case. In the absence of that it
looked okay-ish to me to reference Xen functions directly here.


Oh, and this needs x86 maintainers.


Eventually yes. But I would prefer to sort the above question first
(which I'm sure would have been raised by them, in perhaps more
harsh a way), hence the initially limited exposure.


I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call
of arch_smt_update(). This can then correct any needed mitigation settings.

So something like (note that due to using cpu_feature_enabled(X86_FEATURE_XENPV)
DCE is happening in case CONFIG_XEN_PV isn't defined)":

--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params *boot_params);
 void __init mem_map_via_hcall(struct boot_params *boot_params_p);
 #endif

+int __init xen_vm_assist_ibpb(bool enable);
+void __init xen_pv_fix_mitigations(void);
+
 #endif /* _ASM_X86_XEN_HYPERVISOR_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -18,6 +18,8 @@
 #include 
 #include 

+#include 
+
 #include 
 #include 
 #include 
@@ -177,6 +179,9 @@ void __init check_bugs(void)
srbds_select_mitigation();
l1d_flush_select_mitigation();

+   if (cpu_feature_enabled(X86_FEATURE_XENPV))
+   xen_pv_fix_mitigations();
+
arch_smt_update();

 #ifdef CONFIG_X86_32
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void)
return 0;
 }

+int __init xen_vm_assist_ibpb(bool enable)
+{
+   /*
+* Note that the VM-assist is a disable, so a request to enable IBPB
+* on our behalf needs to turn the functionality off (and vice versa).
+*/
+   return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable
+  : VMASST_CMD_enable,
+   VMASST_TYPE_mode_switch_no_ibpb);
+}
+
+void __init xen_pv_fix_mitigations(void)
+{
+   if (!xen_vm_assist_ibpb(true))
+   setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_xen_pv = {
.name   = "Xen PV",
.detect = xen_platform_pv,
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -413,7 +413,15 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
  */
 #define VMASST_TYPE_runstate_update_flag 5

-#define MAX_VMASST_TYPE 5
+/*
+ * x86-64 guests: Suppress IBPB on guest-user to guest-kernel mode switch.
+ *
+ * By default (on affected and capable hardware) as a safety measure Xen,
+ * to cover for the fact that guest-kernel and guest-user modes are both
+ * running in ring 3 (and hence share prediction context), would issue a
+ * barrier for user->kernel mode switches of PV guests.
+ */
+#define VMASST_TYPE_mode_switch_no_ibpb  33

 #ifndef __ASSEMBLY__


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist

2023-03-17 Thread Andrew Cooper
On 17/03/2023 1:56 pm, Juergen Gross wrote:
> On 15.02.23 09:31, Jan Beulich wrote:
>> On 15.02.2023 01:07, Boris Ostrovsky wrote:
>>>
>>> On 2/14/23 6:53 PM, Boris Ostrovsky wrote:

 On 2/14/23 11:13 AM, Jan Beulich wrote:

> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -18,6 +18,8 @@
>    #include 
>    #include 
>    +#include 
> +
>    #include 
>    #include 
>    #include 
> @@ -32,6 +34,7 @@
>    #include 
>    #include 
>    #include 
> +#include 
>    #include 
>      #include "cpu.h"
> @@ -934,7 +937,8 @@ do_cmd_auto:
>    break;
>      case RETBLEED_MITIGATION_IBPB:
> -    setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
> +    if (!xen_pv_domain() || xen_vm_assist_ibpb(true))


 Is this going to compile without CONFIG_XEN?
>>
>> Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying
>> the compiler) and DCE will eliminate the call to the function due to
>> xen_pv_domain() being constant "false" in that case, avoiding any
>> linking issues. The interesting case here really is building with
>> XEN but without XEN_PV: That's why I needed to put the function in
>> enlighten.c. This wouldn't be needed if xen_pv_domain() was also
>> constant "false" in that case (just like xen_pvh_domain() is when
>> !XEN_PVH).
>>
 I also think these two conditions should be wrapped into something
 to limit exposure of non-Xen code to Xen-specific primitives.
>>
>> I would have done so, if I had any halfway sensible idea on how to
>> go about doing so in this particular case. In the absence of that it
>> looked okay-ish to me to reference Xen functions directly here.
>>
>>> Oh, and this needs x86 maintainers.
>>
>> Eventually yes. But I would prefer to sort the above question first
>> (which I'm sure would have been raised by them, in perhaps more
>> harsh a way), hence the initially limited exposure.
>
> I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call
> of arch_smt_update(). This can then correct any needed mitigation
> settings.
>
> So something like (note that due to using
> cpu_feature_enabled(X86_FEATURE_XENPV)
> DCE is happening in case CONFIG_XEN_PV isn't defined)":
>
> --- a/arch/x86/include/asm/xen/hypervisor.h
> +++ b/arch/x86/include/asm/xen/hypervisor.h
> @@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params
> *boot_params);
>  void __init mem_map_via_hcall(struct boot_params *boot_params_p);
>  #endif
>
> +int __init xen_vm_assist_ibpb(bool enable);
> +void __init xen_pv_fix_mitigations(void);

I'd suggest 'adjust' in preference to 'fix', but this could equally be
xen_pv_mitigations().

XenPV has very legitimate reasons to adjust things owing to it running
in Ring3, but "fix" comes with other implications too.

> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void)
>     return 0;
>  }
>
> +int __init xen_vm_assist_ibpb(bool enable)
> +{
> +   /*
> +    * Note that the VM-assist is a disable, so a request to
> enable IBPB
> +    * on our behalf needs to turn the functionality off (and vice
> versa).
> +    */
> +   return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable
> +  : VMASST_CMD_enable,
> +   VMASST_TYPE_mode_switch_no_ibpb);
> +}
> +
> +void __init xen_pv_fix_mitigations(void)
> +{
> +   if (!xen_vm_assist_ibpb(true))
> +   setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB);

If nothing else, this needs a comment explaining what's going on.

"Xen PV guest kernels run in ring3, so share the same prediction domain
as userspace.  Xen (since version $X) default to issuing an IBPB on
guest user -> guest kernel transitions on behalf of the guest kernel. 
If Linux isn't depending on mode based prediction separation, turn this
behaviour off".

But this does open the next question.  Yes, unilaterally turning turning
this off restores the prior behaviour, but is this really the best thing
to do ?

~Andrew



Re: [PATCH] x86/Xen: make use of IBPB controlling VM assist

2023-03-17 Thread Juergen Gross

On 17.03.23 15:21, Andrew Cooper wrote:

On 17/03/2023 1:56 pm, Juergen Gross wrote:

On 15.02.23 09:31, Jan Beulich wrote:

On 15.02.2023 01:07, Boris Ostrovsky wrote:


On 2/14/23 6:53 PM, Boris Ostrovsky wrote:


On 2/14/23 11:13 AM, Jan Beulich wrote:


--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -18,6 +18,8 @@
    #include 
    #include 
    +#include 
+
    #include 
    #include 
    #include 
@@ -32,6 +34,7 @@
    #include 
    #include 
    #include 
+#include 
    #include 
      #include "cpu.h"
@@ -934,7 +937,8 @@ do_cmd_auto:
    break;
      case RETBLEED_MITIGATION_IBPB:
-    setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+    if (!xen_pv_domain() || xen_vm_assist_ibpb(true))



Is this going to compile without CONFIG_XEN?


Yes. The declaration of xen_vm_assist_ibpb() is visible (satisfying
the compiler) and DCE will eliminate the call to the function due to
xen_pv_domain() being constant "false" in that case, avoiding any
linking issues. The interesting case here really is building with
XEN but without XEN_PV: That's why I needed to put the function in
enlighten.c. This wouldn't be needed if xen_pv_domain() was also
constant "false" in that case (just like xen_pvh_domain() is when
!XEN_PVH).


I also think these two conditions should be wrapped into something
to limit exposure of non-Xen code to Xen-specific primitives.


I would have done so, if I had any halfway sensible idea on how to
go about doing so in this particular case. In the absence of that it
looked okay-ish to me to reference Xen functions directly here.


Oh, and this needs x86 maintainers.


Eventually yes. But I would prefer to sort the above question first
(which I'm sure would have been raised by them, in perhaps more
harsh a way), hence the initially limited exposure.


I'd rather add _one_ hook for Xen-PV in check_bugs() just before the call
of arch_smt_update(). This can then correct any needed mitigation
settings.

So something like (note that due to using
cpu_feature_enabled(X86_FEATURE_XENPV)
DCE is happening in case CONFIG_XEN_PV isn't defined)":

--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -63,4 +63,7 @@ void __init xen_pvh_init(struct boot_params
*boot_params);
  void __init mem_map_via_hcall(struct boot_params *boot_params_p);
  #endif

+int __init xen_vm_assist_ibpb(bool enable);
+void __init xen_pv_fix_mitigations(void);


I'd suggest 'adjust' in preference to 'fix', but this could equally be
xen_pv_mitigations().

XenPV has very legitimate reasons to adjust things owing to it running
in Ring3, but "fix" comes with other implications too.


--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1476,6 +1476,23 @@ static uint32_t __init xen_platform_pv(void)
     return 0;
  }

+int __init xen_vm_assist_ibpb(bool enable)
+{
+   /*
+    * Note that the VM-assist is a disable, so a request to
enable IBPB
+    * on our behalf needs to turn the functionality off (and vice
versa).
+    */
+   return HYPERVISOR_vm_assist(enable ? VMASST_CMD_disable
+  : VMASST_CMD_enable,
+   VMASST_TYPE_mode_switch_no_ibpb);
+}
+
+void __init xen_pv_fix_mitigations(void)
+{
+   if (!xen_vm_assist_ibpb(true))
+   setup_clear_cpu_cap(X86_FEATURE_ENTRY_IBPB);


If nothing else, this needs a comment explaining what's going on.

"Xen PV guest kernels run in ring3, so share the same prediction domain
as userspace.  Xen (since version $X) default to issuing an IBPB on
guest user -> guest kernel transitions on behalf of the guest kernel.
If Linux isn't depending on mode based prediction separation, turn this
behaviour off".

But this does open the next question.  Yes, unilaterally turning turning
this off restores the prior behaviour, but is this really the best thing
to do ?


I believe this question is primarily for Jan, as he is the initial author of
the patch.

I was just suggesting a variant which is IMHO more probable to be accepted by
the x86 maintainers. :-)


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v4] x86/HVM: support emulated UMIP

2023-03-17 Thread Roger Pau Monné
On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote:
> There are three noteworthy drawbacks:
> 1) The intercepts we need to enable here are CPL-independent, i.e. we
>now have to emulate certain instructions for ring 0.
> 2) On VMX there's no intercept for SMSW, so the emulation isn't really
>complete there.

Then I'm afraid we can't set the bit in the max CPUID policy.  What
about domains being migrated from a host that has UMIP to an Intel
host where UMIP is emulated?  They would see a change in behavior in
SMSW, and the behavior won't match the ISA anymore.

Thanks, Roger.



Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh

2023-03-17 Thread Alex Deucher
On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini
 wrote:
>
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross  wrote:
> > > >
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich  wrote:
> > > > > >
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > >
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how 
> > > > > > > > > > can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > >
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, 
> > > > > > > > > there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > >
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > >
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > >
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > >
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it 
> > > > > > > does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), 
> > > > > > > we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > >
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > >
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > >
> > > > But the swiotlb isn't per device, but system global.
> > >
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> >
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA 
> > mask
> > won't use the swiotlb unless you have a buffer above guest physical address 
> > of
> > 16TB (so basically never).
> >
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> >
> > I don't think this is acceptable.
>
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.
>
> I think we should leave the global "swiotlb" setti

Re: [PATCH v3 1/6] xen: add reference counter support

2023-03-17 Thread Jan Beulich
On 17.03.2023 11:05, Roger Pau Monné wrote:
> On Thu, Mar 16, 2023 at 05:56:00PM +0100, Jan Beulich wrote:
>> On 16.03.2023 17:48, Roger Pau Monné wrote:
>>> On Thu, Mar 16, 2023 at 05:43:18PM +0100, Jan Beulich wrote:
 On 16.03.2023 17:39, Roger Pau Monné wrote:
> On Thu, Mar 16, 2023 at 05:32:38PM +0100, Jan Beulich wrote:
>> On 16.03.2023 17:19, Roger Pau Monné wrote:
>>> On Tue, Mar 14, 2023 at 08:56:29PM +, Volodymyr Babchuk wrote:
 +static inline void refcnt_get(refcnt_t *refcnt)
 +{
 +int old = atomic_add_unless(&refcnt->refcnt, 1, 0);
>>>
>>> Occurred to me while looking at the next patch:
>>>
>>> Don't you also need to print a warning (and saturate the counter
>>> maybe?) if old == 0, as that would imply the caller is attempting
>>> to take a reference of an object that should be destroyed? IOW: it
>>> would point to some kind of memory leak.
>>
>> Hmm, I notice the function presently returns void. I think what to do
>> when the counter is zero needs leaving to the caller. See e.g.
>> get_page() which will simply indicate failure to the caller in case
>> the refcnt is zero. (There overflow handling also is left to the
>> caller ... All that matters is whether a ref can be acquired.)
>
> Hm, likely.  I guess pages never go away even when it's refcount
> reaches 0.
>
> For the pdev case attempting to take a refcount on an object that has
> 0 refcounts implies that the caller is using leaked memory, as the
> point an object reaches 0 it supposed to be destroyed.

 Hmm, my thinking was that a device would remain at refcnt 0 until it is
 actually removed, i.e. refcnt == 0 being a prereq for pci_remove_device()
 to be willing to do anything at all. But maybe that's not a viable model.
>>>
>>> Right, I think the intention was for pci_remove_device() to drop the
>>> refcount to 0 and do the removal, so the refcount should be 1 when
>>> calling pci_remove_device().  But none of this is written down, so
>>> it's mostly my assumptions from looking at the code.
>>
>> Could such work at all? The function can't safely drop a reference
>> and _then_ check whether it was the last one. The function either has
>> to take refcnt == 0 as prereq, or it needs to be the destructor
>> function that refcnt_put() calls.
> 
> But then you also get in the trouble of asserting that refcnt == 0
> doesn't change between evaluation and actual removal of the structure.
> 
> Should all refcounts to pdev be taken and dropped while holding the
> pcidevs lock?
> 
> I there an email (outside of this series) that contains a description
> of how the refcounting is to be used with pdevs?

I'm not aware of one. The intentions indeed need outlining somewhere.

Jan



Re: [PATCH 4/4] Update README to state Python3 requirement

2023-03-17 Thread Jan Beulich
On 17.03.2023 13:37, George Dunlap wrote:
> On Fri, Mar 17, 2023 at 8:46 AM Jan Beulich  wrote:
> 
>> On 16.03.2023 18:16, Marek Marczykowski-Górecki wrote:
>>> Python2 is not supported anymore.
>>
>> There are two things here which concern me: For one, how come this is
>> at the end of a series? You want to keep in mind that any series may
>> be committed piecemeal (unless an indication to the contrary is in
>> the cover letter, but there's none here in the first place).
>>
>> The other aspect is that there's no indication here of it being
>> consensus that we raise the baseline requirement for Python, and for
>> Python alone. A decision towards the wider topic of raising baseline
>> requirements is, as you may recall from the meeting in Cambridge,
>> still pending.
>>
> 
> To me, the idea behind that discussion was that if we agree on a policy --
> or at least general principles -- then we can avoid having to have
> discussions on a case-by-case basis.  The fact that the discussion is still
> open isn't a reason not to deprecate features; it just means that we still
> need to discuss each one on its merits.
> 
> Given that Python 2 was announced unsupported years ago now, it seems
> obvious to me that we should stop trying to support it.
> 
> Are you arguing that we *should* continue to support Python2?  Do you think
> anyone will?

I think we really need basics of a policy first. Otherwise what can or
cannot be proposed to no longer be supported is just too arbitrary. Here
as well as elsewhere my fear is that thing would stop building on rather
old distros, where so far things have been building fine. (For Py3 in
particular that's no _that_ much of a concern, because quite some time
ago qemu already started requiring it. But it would still be a change in
how I would need to invoke builds on such old systems that I try Xen out
on every once in a while, because I'd then need to override the Python
to use not just for qemu, but globally.)

Jan



Re: [PATCH v2] xen/console: Skip switching serial input to non existing domains

2023-03-17 Thread Jan Beulich
On 17.03.2023 10:32, Michal Orzel wrote:
> 
> 
> On 17/03/2023 09:36, Jan Beulich wrote:
>>
>>
>> On 16.03.2023 23:59, Stefano Stabellini wrote:
>>> On Thu, 16 Mar 2023, Jan Beulich wrote:
 On 16.03.2023 11:26, Michal Orzel wrote:
> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -490,7 +490,24 @@ static void switch_serial_input(void)
>  }
>  else
>  {
> -console_rx++;
> +unsigned int next_rx = console_rx + 1;
> +
> +/* Skip switching serial input to non existing domains */
> +while ( next_rx < max_init_domid + 1 )
> +{
> +struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
> +
> +if ( d )
> +{
> +rcu_unlock_domain(d);
> +break;
> +}
> +
> +next_rx++;
> +}
> +
> +console_rx = next_rx;
> +
>  printk("*** Serial input to DOM%d", console_rx - 1);
>  }

 While at the first glance (when you sent it in reply to v1) it looked okay,
 I'm afraid it really isn't: Please consider what happens when the last of
 the DomU-s doesn't exist anymore. (You don't really check whether it still
 exists, because the range check comes ahead of the existence one.) In that
 case you want to move from second-to-last to Xen. I expect the entire
 if/else construct wants to be inside the loop.
>>>
>>> I don't think we need another loop, just a check if we found a domain or
>>
>> I didn't say "another loop", but I suggested that the loop needs to be
>> around the if/else. Of course this can be transformed into equivalent
>> forms, like ...
>>
>>> not. E.g.:
>>>
>>>
>>> unsigned int next_rx = console_rx + 1;
>>>
>>> /* Skip switching serial input to non existing domains */
>>> while ( next_rx < max_init_domid + 1 )
>>> {
>>> struct domain *d = rcu_lock_domain_by_id(next_rx - 1);
>>>
>>> if ( d )
>>> {
>>> rcu_unlock_domain(d);
>>> console_rx = next_rx;
>>> printk("*** Serial input to DOM%d", console_rx - 1);
>>> break;
>>> }
>>>
>>> next_rx++;
>>> }
>>>
>>> /* no domain found */
>>> console_rx = 0;
>>> printk("*** Serial input to Xen");
>>
>> ... what you suggest (or at least almost, because the way it's written
>> we'd always switch to Xen).
> 
> I would prefer a loop with if/else inside. If you are ok with the following 
> code
> that handles all the cases, I will push a patch in a minute:

Looks roughly okay, but recall I said so also when you "pre-posted" the
previous version.

Jan



Re: [PATCH v8 1/5] xen: introduce CONFIG_GENERIC_BUG_FRAME

2023-03-17 Thread Jan Beulich
On 17.03.2023 10:23, Oleksii wrote:
> On Thu, 2023-03-16 at 12:26 +0100, Jan Beulich wrote:
>> On 15.03.2023 18:21, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/common/bug.c
>>> @@ -0,0 +1,108 @@
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>
>> I actually meant to also ask: What is this needed for? Glancing over
>> the
>> code ...
>>
>>> +/*
>>> + * Returns a negative value in case of an error otherwise
>>> + * BUGFRAME_{run_fn, warn, bug, assert}
>>> + */
>>> +int do_bug_frame(struct cpu_user_regs *regs, unsigned long pc)
>>> +{
>>> +    const struct bug_frame *bug = NULL;
>>> +    const struct virtual_region *region;
>>> +    const char *prefix = "", *filename, *predicate;
>>> +    unsigned long fixup;
>>> +    unsigned int id, lineno;
>>> +
>>> +    region = find_text_region(pc);
>>> +    if ( !region )
>>> +    return -EINVAL;
>>> +
>>> +    for ( id = 0; id < BUGFRAME_NR; id++ )
>>> +    {
>>> +    const struct bug_frame *b;
>>> +    size_t i;
>>> +
>>> +    for ( i = 0, b = region->frame[id].bugs;
>>> +  i < region->frame[id].n_bugs; b++, i++ )
>>> +    {
>>> +    if ( bug_loc(b) == pc )
>>> +    {
>>> +    bug = b;
>>> +    goto found;
>>> +    }
>>> +    }
>>> +    }
>>> +
>>> + found:
>>> +    if ( !bug )
>>> +    return -ENOENT;
>>> +
>>> +    if ( id == BUGFRAME_run_fn )
>>> +    {
>>> +    void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
>>> +
>>> +    fn(regs);
>>> +
>>> +    /* Re-enforce consistent types, because of the casts
>>> involved. */
>>> +    if ( false )
>>> +    run_in_exception_handler(fn);
>>> +
>>> +    return id;
>>> +    }
>>> +
>>> +    /* WARN, BUG or ASSERT: decode the filename pointer and line
>>> number. */
>>> +    filename = bug_ptr(bug);
>>> +    if ( !is_kernel(filename) && !is_patch(filename) )
>>> +    return -EINVAL;
>>> +    fixup = strlen(filename);
>>> +    if ( fixup > 50 )
>>> +    {
>>> +    filename += fixup - 47;
>>> +    prefix = "...";
>>> +    }
>>> +    lineno = bug_line(bug);
>>> +
>>> +    switch ( id )
>>> +    {
>>> +    case BUGFRAME_warn:
>>> +    printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
>>> +    show_execution_state(regs);
>>> +
>>> +    break;
>>> +
>>> +    case BUGFRAME_bug:
>>> +    printk("Xen BUG at %s%s:%d\n", prefix, filename, lineno);
>>> +
>>> +    if ( BUG_DEBUGGER_TRAP_FATAL(regs) )
>>> +    break;
>>> +
>>> +    show_execution_state(regs);
>>> +    panic("Xen BUG at %s%s:%d\n", prefix, filename, lineno);
>>> +
>>> +    case BUGFRAME_assert:
>>> +    /* ASSERT: decode the predicate string pointer. */
>>> +    predicate = bug_msg(bug);
>>> +    if ( !is_kernel(predicate) && !is_patch(predicate) )
>>> +    predicate = "";
>>> +
>>> +    printk("Assertion '%s' failed at %s%s:%d\n",
>>> +   predicate, prefix, filename, lineno);
>>> +
>>> +    if ( BUG_DEBUGGER_TRAP_FATAL(regs) )
>>> +    break;
>>> +
>>> +    show_execution_state(regs);
>>> +    panic("Assertion '%s' failed at %s%s:%d\n",
>>> +  predicate, prefix, filename, lineno);
>>> +    }
>>> +
>>> +    return id;
>>> +}
>>
>> ... I can't really spot what it might be that comes from that header.
>> Oh, on the N+1st run I've spotted it - it's show_execution_state().
>> The declaration of which, already being used from common code ahead
>> of this series, should imo be moved to a common header. I guess I'll
>> make yet another patch ...
> As mentioned above. Not only show_execution_state() but also
> cpu_user_regs structure. ( at lest, for ARM & RISCV )

Do we deref "regs" anywhere? I can't seem to be able to spot an instance.
Without a deref (or alike) a forward decl is all that's needed for this
code to compile.

Jan



Re: [PATCH v4] x86/HVM: support emulated UMIP

2023-03-17 Thread Jan Beulich
On 17.03.2023 15:29, Roger Pau Monné wrote:
> On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote:
>> There are three noteworthy drawbacks:
>> 1) The intercepts we need to enable here are CPL-independent, i.e. we
>>now have to emulate certain instructions for ring 0.
>> 2) On VMX there's no intercept for SMSW, so the emulation isn't really
>>complete there.
> 
> Then I'm afraid we can't set the bit in the max CPUID policy.  What
> about domains being migrated from a host that has UMIP to an Intel
> host where UMIP is emulated?  They would see a change in behavior in
> SMSW, and the behavior won't match the ISA anymore.

Right, but that's the price to pay if we want such emulation (which back
at the time did look at least desirable, because the other affected insns
are more important to deal with). Not setting the bit in the max policy
is as good as not having emulation on VMX at all then.

Jan



[xen-unstable-smoke test] 179720: tolerable trouble: pass/starved - PUSHED

2023-03-17 Thread osstest service owner
flight 179720 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179720/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  9bf21fcaef07f68ab52d0382ff554616a1cf66d8
baseline version:
 xen  9c0061825143716c61622966e76983886ef59361

Last test of basis   179716  2023-03-17 11:01:55 Z0 days
Testing same since   179720  2023-03-17 13:01:58 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Michal Orzel 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  starved 
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  starved 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   9c00618251..9bf21fcaef  9bf21fcaef07f68ab52d0382ff554616a1cf66d8 -> smoke



Re: [PATCH v4] x86/HVM: support emulated UMIP

2023-03-17 Thread Roger Pau Monné
On Fri, Mar 17, 2023 at 04:01:59PM +0100, Jan Beulich wrote:
> On 17.03.2023 15:29, Roger Pau Monné wrote:
> > On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote:
> >> There are three noteworthy drawbacks:
> >> 1) The intercepts we need to enable here are CPL-independent, i.e. we
> >>now have to emulate certain instructions for ring 0.
> >> 2) On VMX there's no intercept for SMSW, so the emulation isn't really
> >>complete there.
> > 
> > Then I'm afraid we can't set the bit in the max CPUID policy.  What
> > about domains being migrated from a host that has UMIP to an Intel
> > host where UMIP is emulated?  They would see a change in behavior in
> > SMSW, and the behavior won't match the ISA anymore.
> 
> Right, but that's the price to pay if we want such emulation (which back
> at the time did look at least desirable, because the other affected insns
> are more important to deal with). Not setting the bit in the max policy
> is as good as not having emulation on VMX at all then.

It would need some kind of justification at least on why it's deemed
worth exposing in the max policy (and thus made available to incoming
guests) even when not compliant to the specification.

Could the non-intercaption of CR0 reads and thus no #GP on SMSW on
Intel lead to software malfunctioning as a result?

Thanks, Roger.



Re: [PATCH v3 2/2] x86: detect CMOS aliasing on ports other than 0x70/0x71

2023-03-17 Thread Roger Pau Monné
On Mon, Jul 20, 2020 at 01:11:18PM +0200, Roger Pau Monné wrote:
> On Wed, Jul 15, 2020 at 01:57:07PM +0200, Jan Beulich wrote:
> > ... in order to also intercept accesses through the alias ports.
> > 
> > Also stop intercepting accesses to the CMOS ports if we won't ourselves
> > use the CMOS RTC.
> 
> Will wait for v4 with the fixed additions to PVH dom0.

I think this is waiting for a v4, let me know if that's not the case.

Thanks, Roger.



[linux-5.4 test] 179711: regressions - trouble: fail/pass/starved

2023-03-17 Thread osstest service owner
flight 179711 linux-5.4 real [real]
flight 179722 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/179711/
http://logs.test-lab.xenproject.org/osstest/logs/179722/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 179598

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179598
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179598
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179598
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179598
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179598
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179598
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179598
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179598
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179598
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-examine  1 build-check(1)   starved  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   starved  n/a
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   starved  n/a
 test-arm64-arm64-xl   1 build-check(1)   starved  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   starved  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   starved  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   starved  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   starved  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   starved  n/a
 test-armhf-armhf-examine  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   starved  n/a
 build-arm64-pvops 2 hosts-allocate   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 linuxe4b5c766f50525d84754cfdf0e4edef1db9622c0
baseline version:
 linuxb829e8b6e1a7bb8cd3042b927113a22fed0b0d47

Last test of basis   179598  2023-03-13 09:42:37 Z4 days
Testing same since   179711  2023-03-17 07:42:06 Z0 days1 attempts


People who touched revisions under test:
  Al Viro 
  Alex Deucher 
  Alexandre Ghiti 
  Alvaro Karsz 
  Amir Goldstein 
  Andrew Cooper 
  Andrew Morton 
  Ashok Raj 
  Bart Van Assche 
  Bixuan Cui 
  Bjorn Helgaas 
  Borislav Petkov (AMD) 
  Borislav Petkov 
  Chandrakanth Patil 
  Chris Paterson (CIP) 
  Corey Minyard 
  D. Wythe 
  Darrick J. Wong 
  David S. Miller 
  Dmitry Baryshkov 
  Edward Humes 
  Eric Dumazet 
  Eric Whitney 
  Fedor Pchelkin 
  Florian Fainelli 
  Florian Westphal 
  Gavrilov Ilia 
  Greg Kroah-Hartman 
  Guenter Roeck 
  H.J. Lu 
  Hangbin Liu 
  Harry Wentland 
  Harshit Mogalapalli 
  Heiko Carstens 
  Ilia.Gavrilov 
  Jacob Pan 
  Jakub Kicinski 
  Jan Kara 
  Jani Nikula 
  Jens Axboe 
  Joerg Roedel 
  Johan Hovold 
  John Harrison 
  John Paul Adrian Glaubitz 
  Jouni Högander 
  Kang Chen 
  Kim Phillips 
  Laurent Pinchart 
  Lee Jones 
  Liguang Zhang 
  Linux Kernel Functional Testing 
  Lorenz Bauer 
  Lorenz Bauer 
  Lu Baolu 
  Marc Zyngier

Re: [PATCH v4] x86/HVM: support emulated UMIP

2023-03-17 Thread Andrew Cooper
On 17/03/2023 2:29 pm, Roger Pau Monné wrote:
> On Thu, Apr 15, 2021 at 11:47:42AM +0200, Jan Beulich wrote:
>> There are three noteworthy drawbacks:
>> 1) The intercepts we need to enable here are CPL-independent, i.e. we
>>now have to emulate certain instructions for ring 0.
>> 2) On VMX there's no intercept for SMSW, so the emulation isn't really
>>complete there.
> Then I'm afraid we can't set the bit in the max CPUID policy.  What
> about domains being migrated from a host that has UMIP to an Intel
> host where UMIP is emulated?  They would see a change in behavior in
> SMSW, and the behavior won't match the ISA anymore.

There are conflicting opinions on this.  But the truth is that SMSW only
leaks the bottom nibble(?) of CR0 and that simply isn't information that
is of use to an attacker like SGDT/SIDT is.

So from an entirely ideal point of view there is an argument to say that
UMIP-but-can't-block-SMSW is better than no UMIP.


Except, I'm not fully convinced by this argument.

SMSW aside, emulating UMIP on hardware without it involves emulating the
guest being able to set CR4.UMIP which is reserved so we have to
intercept #UD, and intercepting all #GP so we can find the
S{I,LG}DT/STR/SMSW(on AMD) instructions and fail them in Ring3.

We went to a lot of effort to not intercept #UD (by default) because it
exposed x86_emulate() to guest userspace and caused us a huge number of
security headaches.  Similarly, #GP interception is the source of a lot
of security bugs on other hypervisors.

So there is large security concern with this patch.  Which is not a no,
but definitely is a "need to think about this more carefully".

This logic isn't useful for Linux.  All versions of Linux which know
about UMIP already put the IDT and GDT on read-only mappings to prevent
SIDT/SGDT being useful to an attacker on hardware lacking UMIP.  I don't
know what Windows does here, but I would be amazed if they don't
something similar.

Therefore, this logic is only useful for guests which do know about
UIMP, and do not have any other defences against SIDT/SGD.  If this
isn't an empty set of kernels, it will be a small set.


Also, as a note, the XTF UMIP test declares a failure against KVM
(because SMSW does leak), and will do the same on Xen after this patch. 
I don't think OSSTest will declare this to be a blocking regression,
because the XTF test won't be configured for "max", so it won't notice. 
And because I still haven't got the test-version logic working, we can't
modify the XTF UMIP behaviour without breaking the OSSTest bisector...

~Andrew



[PATCH] x86/APIC: modify error_interrupt() to output using single printk()

2023-03-17 Thread Elliott Mitchell
This takes care of the issue of APIC errors tending to occur on multiple
cores at one.  In turn this tends to causes the error messages to be
merged together, making understanding them difficult.

Signed-off-by: Elliott Mitchell 
---
 xen/arch/x86/apic.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index f71474d47d..5399488120 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1412,6 +1412,12 @@ static void cf_check error_interrupt(struct 
cpu_user_regs *regs)
 };
 unsigned int v, v1;
 int i;
+char head[] = XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)";
+char entry[] = ", %s";
+/* header string, ", %s" for each potential entry, newline */
+char fmt[sizeof(head) + ARRAY_SIZE(esr_fields) * (sizeof(entry) - 1) + 1];
+const char *args[ARRAY_SIZE(esr_fields)];
+unsigned int count = 0;
 
 /* First tickle the hardware, only then report what went on. -- REW */
 v = apic_read(APIC_ESR);
@@ -1419,12 +1425,18 @@ static void cf_check error_interrupt(struct 
cpu_user_regs *regs)
 v1 = apic_read(APIC_ESR);
 ack_APIC_irq();
 
-printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)",
-smp_processor_id(), v , v1);
+memcpy(fmt, head, sizeof(head) - 1);
+
 for ( i = 7; i >= 0; --i )
-if ( v1 & (1 << i) )
-printk(", %s", esr_fields[i]);
-printk("\n");
+if ( v1 & (1 << i) ) {
+memcpy(fmt + sizeof(head) + (sizeof(entry) - 1) * count - 1, entry,
+sizeof(entry) - 1);
+args[count++] = esr_fields[i];
+}
+memcpy(fmt + sizeof(head) + (sizeof(entry) - 1) * count - 1, "\n", 2);
+/*_Static_assert(ARRAY_SIZE(args) == 8, "printk() here needs args 
added");*/
+printk(fmt, smp_processor_id(), v , v1, args[0], args[1], args[2], args[3],
+args[4], args[5], args[6], args[7]);
 }
 
 /*
-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (|ehem+sig...@m5p.comPGP 87145445|)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





[linux-linus test] 179702: regressions - trouble: fail/pass/starved

2023-03-17 Thread osstest service owner
flight 179702 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179702/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-freebsd12-amd64 13 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-xl-credit1  19 guest-saverestore.2  fail REGR. vs. 178042
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 178042
 test-amd64-amd64-freebsd11-amd64 13 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 178042
 test-amd64-amd64-xl-multivcpu 20 guest-localmigrate/x10  fail REGR. vs. 178042
 test-amd64-amd64-libvirt 18 guest-saverestore.2  fail REGR. vs. 178042
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 178042
 test-amd64-amd64-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. 
vs. 178042
 test-amd64-amd64-libvirt-xsm 18 guest-saverestore.2  fail REGR. vs. 178042
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
vs. 178042
 test-arm64-arm64-xl-thunderx 14 guest-start  fail REGR. vs. 178042
 test-arm64-arm64-xl-xsm  17 guest-stop   fail REGR. vs. 178042
 test-arm64-arm64-xl-credit2  14 guest-start  fail REGR. vs. 178042
 test-arm64-arm64-libvirt-xsm 14 guest-start  fail REGR. vs. 178042
 test-arm64-arm64-xl 18 guest-start/debian.repeat fail REGR. vs. 178042
 test-amd64-amd64-xl-credit2 22 guest-start/debian.repeat fail REGR. vs. 178042
 test-arm64-arm64-xl-credit1 18 guest-start/debian.repeat fail REGR. vs. 178042
 test-amd64-amd64-qemuu-nested-intel 12 debian-hvm-install fail REGR. vs. 178042
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 18 guest-localmigrate/x10 fail 
REGR. vs. 178042
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail REGR. vs. 178042
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 178042
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail 
REGR. vs. 178042
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail 
REGR. vs. 178042
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
178042
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
178042
 test-amd64-amd64-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
178042
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install 
fail REGR. vs. 178042
 test-amd64-amd64-pair25 guest-start/debian   fail REGR. vs. 178042
 test-amd64-amd64-qemuu-nested-amd 12 debian-hvm-install  fail REGR. vs. 178042
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 178042
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 178042
 test-amd64-amd64-xl-vhd  12 debian-di-installfail REGR. vs. 178042
 test-amd64-amd64-pygrub  12 debian-di-installfail REGR. vs. 178042
 test-amd64-amd64-libvirt-raw 12 debian-di-installfail REGR. vs. 178042
 test-amd64-amd64-libvirt-qcow2 12 debian-di-install  fail REGR. vs. 178042
 test-arm64-arm64-xl-vhd  12 debian-di-installfail REGR. vs. 178042
 test-arm64-arm64-libvirt-raw 12 debian-di-installfail REGR. vs. 178042

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 18 guest-localmigrate   fail REGR. vs. 178042

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 178042
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 178042
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 178042
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 178042
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-examine  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-a

Re: [BUG] x2apic broken with current AMD hardware

2023-03-17 Thread Elliott Mitchell
On Fri, Mar 17, 2023 at 09:22:09AM +0100, Jan Beulich wrote:
> On 16.03.2023 23:03, Elliott Mitchell wrote:
> > On Mon, Mar 13, 2023 at 08:01:02AM +0100, Jan Beulich wrote:
> >> On 11.03.2023 01:09, Elliott Mitchell wrote:
> >>> On Thu, Mar 09, 2023 at 10:03:23AM +0100, Jan Beulich wrote:
> 
>  In any event you will want to collect a serial log at maximum verbosity.
>  It would also be of interest to know whether turning off the IOMMU avoids
>  the issue as well (on the assumption that your system has less than 255
>  CPUs).
> >>>
> >>> I think I might have figured out the situation in a different fashion.
> >>>
> >>> I was taking a look at the BIOS manual for this motherboard and noticed
> >>> a mention of a "Local APIC Mode" setting.  Four values are listed
> >>> "Compatibility", "xAPIC", "x2APIC", and "Auto".
> >>>
> >>> That is the sort of setting I likely left at "Auto" and that may well
> >>> result in x2 functionality being disabled.  Perhaps the x2APIC
> >>> functionality on AMD is detecting whether the hardware is present, and
> >>> failing to test whether it has been enabled?  (could be useful to output
> >>> a message suggesting enabling the hardware feature)
> >>
> >> Can we please move to a little more technical terms here? What is "present"
> >> and "enabled" in your view? I don't suppose you mean the CPUID bit (which
> >> we check) and the x2APIC-mode-enable one (which we drive as needed). It's
> >> also left unclear what the four modes of BIOS operation evaluate to. Even
> >> if we knew that, overriding e.g. "Compatibility" (which likely means some
> >> form of "disabled" / "hidden") isn't normally an appropriate thing to do.
> >> In "Auto" mode Xen likely should work - the only way I could interpret the
> >> the other modes are "xAPIC" meaning no x2APIC ACPI tables entries (and
> >> presumably the CPUID bit also masked), "x2APIC" meaning x2APIC mode pre-
> >> enabled by firmware, and "Auto" leaving it to the OS to select. Yet that's
> >> speculation on my part ...
> > 
> > I provided the information I had discovered.  There is a setting for this
> > motherboard (likely present on some similar motherboards) which /may/
> > effect the issue.  I doubt I've tried "compatibility", but none of the
> > values I've tried have gotten the system to boot without "x2apic=false"
> > on Xen's command-line.
> > 
> > When setting to "x2APIC" just after "(XEN) AMD-Vi: IOMMU Extended Features:"
> > I see the line "(XEN) - x2APIC".  Later is the line
> > "(XEN) x2APIC mode is already enabled by BIOS."  I'll guess "Auto"
> > leaves the x2APIC turned off since neither line is present.
> 
> When "(XEN) - x2APIC" is absent the IOMMU can't be switched into x2APIC
> mode. Are you sure that's the case when using "Auto"?

grep -eAPIC\ driver -e-\ x2APIC:

"Auto":
(XEN) Using APIC driver default
(XEN) Overriding APIC driver with bigsmp
(XEN) Switched to APIC driver x2apic_cluster

"x2APIC":
(XEN) Using APIC driver x2apic_cluster
(XEN) - x2APIC

Yes, I'm sure.

> > Both cases the line "(XEN) Switched to APIC driver x2apic_cluster" is
> > present (so perhaps "Auto" merely doesn't activate it).
> 
> Did you also try "x2apic_phys" on the Xen command line (just to be sure
> this isn't a clustered-mode only issue)?

No.  In fact x2apic_cluster is mentioned in all failure cases.

> > Appears error_interrupt() needs locking or some concurrency handling
> > mechanism since the last error is jumbled.  With the setting "x2APIC"
> > I get a bunch of:
> > "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)"
> > (apparently one for each core)
> > Followed by "Receive accept error, Receive accept error," (again,
> > apparently one for each core).  Then a bunch of newlines (same pattern).
> 
> This is a known issue, but since the messages shouldn't appear in the
> first place so far no-one has bothered to address this.

I won't claim it is the best solution, but see other message.

I'm tempted to propose allowing _Static_assert() since it is valuable
functionality for preventing issues.

> > With the setting "auto" the last message is a series of
> > "(XEN) CPU#: No irq handler for vector ## (IRQ -2147483648, LAPIC)" on
> > 2 different cores.  Rather more of the lines were from one core, the
> > vector value varied some.
> > 
> > Notable both sets of final error messages appeared after the Domain 0
> > kernel thought it had been operating for >30 seconds.  Lack of
> > response to interrupt generating events (pressing keys on USB keyboard)
> > suggests interrupts weren't getting through.
> > 
> > 
> > With "x2apic=false" error messages similar to the "Local APIC Mode"
> > of "x2APIC" appear >45 seconds after Domain 0 kernel start.  Of note
> > first "(XEN) APIC error on CPU#: 00(08)(XEN) APIC error on CPU#: 00(08)"
> > appears for all cores with "Receive accept error,".
> > 
> > Yet later a variation on this message starts appearing:
> > "(XEN) APIC error on CPU#: 08(08)(XEN) APIC error on CPU#: 08(0

Re: [GIT PULL] xen: branch for v6.3-rc3

2023-03-17 Thread pr-tracker-bot
The pull request you sent on Fri, 17 Mar 2023 07:40:23 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
> for-linus-6.3-rc3-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0eb392ec095ee29b932985deefb43e6d86d8a463

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



[libvirt test] 179706: tolerable trouble: pass/starved - PUSHED

2023-03-17 Thread osstest service owner
flight 179706 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179706/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 libvirt  5589a3e1f30ab6b9b3247b98f1a6697b0578923a
baseline version:
 libvirt  8386242bd0f6c1cb242f9c711e2ef864bf114d0d

Last test of basis   179640  2023-03-15 04:19:43 Z2 days
Failing since179665  2023-03-16 04:18:51 Z1 days2 attempts
Testing same since   179706  2023-03-17 04:18:50 Z0 days1 attempts


People who touched revisions under test:
  Jiri Denemark 
  Ján Tomko 
  Michal Privoznik 
  Or Ozeri 
  Tim Wiederhake 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  starved 
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  starved 
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt starved 
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   starved 
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw starved 
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports,

Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-03-17 Thread Stefano Stabellini
On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > >> So yes, it then all boils down to that Linux-
> > >> internal question.
> > > 
> > > Excellent question but we'll have to wait for Ray as he is the one with
> > > access to the hardware. But I have this data I can share in the
> > > meantime:
> > > 
> > > [1.260378] IRQ to pin mappings:
> > > [1.260387] IRQ1 -> 0:1
> > > [1.260395] IRQ2 -> 0:2
> > > [1.260403] IRQ3 -> 0:3
> > > [1.260410] IRQ4 -> 0:4
> > > [1.260418] IRQ5 -> 0:5
> > > [1.260425] IRQ6 -> 0:6
> > > [1.260432] IRQ7 -> 0:7
> > > [1.260440] IRQ8 -> 0:8
> > > [1.260447] IRQ9 -> 0:9
> > > [1.260455] IRQ10 -> 0:10
> > > [1.260462] IRQ11 -> 0:11
> > > [1.260470] IRQ12 -> 0:12
> > > [1.260478] IRQ13 -> 0:13
> > > [1.260485] IRQ14 -> 0:14
> > > [1.260493] IRQ15 -> 0:15
> > > [1.260505] IRQ106 -> 1:8
> > > [1.260513] IRQ112 -> 1:4
> > > [1.260521] IRQ116 -> 1:13
> > > [1.260529] IRQ117 -> 1:14
> > > [1.260537] IRQ118 -> 1:15
> > > [1.260544]  done.
> > 
> > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > Linux running baremetal on the same hardware?
> 
> So I have some emails from Ray from he time he was looking into this,
> and on Linux dom0 PVH dmesg there is:
> 
> [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 0-23
> [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 
> 24-55
> 
> So it seems the vIO-APIC data provided by Xen to dom0 is at least
> consistent.
>  
> > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > 112 (which is the one causing issues):
> > > 
> > > __acpi_register_gsi->
> > > acpi_register_gsi_ioapic->
> > > mp_map_gsi_to_irq->
> > > mp_map_pin_to_irq->
> > > __irq_resolve_mapping()
> > > 
> > > if (likely(data)) {
> > > desc = irq_data_to_desc(data);
> > > if (irq)
> > > *irq = data->irq;
> > > /* this IRQ is 112, IO-APIC-34 domain */
> > > }
> 
> 
> Could this all be a result of patch 4/5 in the Linux series ("[RFC
> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> using PHYSDEV ops instead of doing it natively from the IO-APIC?
> 
> FWIW, the introduced function in that patch
> (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> acpi_register_gsi_ioapic() without checking if the GSI is already
> registered, which might lead to multiple IRQs being allocated for the
> same underlying GSI?

I understand this point and I think it needs investigating.


> As I commented there, I think that approach is wrong.  If the GSI has
> not been mapped in Xen (because dom0 hasn't unmasked the respective
> IO-APIC pin) we should add some logic in the toolstack to map it
> before attempting to bind.

But this statement confuses me. The toolstack doesn't get involved in
IRQ setup for PCI devices for HVM guests? Keep in mind that this is a
regular HVM guest creation on PVH Dom0, so normally the IRQ setup is
done by QEMU, and QEMU already calls xc_physdev_map_pirq and
xc_domain_bind_pt_pci_irq. So I don't follow your statement about "the
toolstack to map it before attempting to bind".

Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-03-17 Thread Roger Pau Monné
On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > >> So yes, it then all boils down to that Linux-
> > > >> internal question.
> > > > 
> > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > access to the hardware. But I have this data I can share in the
> > > > meantime:
> > > > 
> > > > [1.260378] IRQ to pin mappings:
> > > > [1.260387] IRQ1 -> 0:1
> > > > [1.260395] IRQ2 -> 0:2
> > > > [1.260403] IRQ3 -> 0:3
> > > > [1.260410] IRQ4 -> 0:4
> > > > [1.260418] IRQ5 -> 0:5
> > > > [1.260425] IRQ6 -> 0:6
> > > > [1.260432] IRQ7 -> 0:7
> > > > [1.260440] IRQ8 -> 0:8
> > > > [1.260447] IRQ9 -> 0:9
> > > > [1.260455] IRQ10 -> 0:10
> > > > [1.260462] IRQ11 -> 0:11
> > > > [1.260470] IRQ12 -> 0:12
> > > > [1.260478] IRQ13 -> 0:13
> > > > [1.260485] IRQ14 -> 0:14
> > > > [1.260493] IRQ15 -> 0:15
> > > > [1.260505] IRQ106 -> 1:8
> > > > [1.260513] IRQ112 -> 1:4
> > > > [1.260521] IRQ116 -> 1:13
> > > > [1.260529] IRQ117 -> 1:14
> > > > [1.260537] IRQ118 -> 1:15
> > > > [1.260544]  done.
> > > 
> > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > Linux running baremetal on the same hardware?
> > 
> > So I have some emails from Ray from he time he was looking into this,
> > and on Linux dom0 PVH dmesg there is:
> > 
> > [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 
> > 0-23
> > [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 
> > 24-55
> > 
> > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > consistent.
> >  
> > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > 112 (which is the one causing issues):
> > > > 
> > > > __acpi_register_gsi->
> > > > acpi_register_gsi_ioapic->
> > > > mp_map_gsi_to_irq->
> > > > mp_map_pin_to_irq->
> > > > __irq_resolve_mapping()
> > > > 
> > > > if (likely(data)) {
> > > > desc = irq_data_to_desc(data);
> > > > if (irq)
> > > > *irq = data->irq;
> > > > /* this IRQ is 112, IO-APIC-34 domain */
> > > > }
> > 
> > 
> > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > 
> > FWIW, the introduced function in that patch
> > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > acpi_register_gsi_ioapic() without checking if the GSI is already
> > registered, which might lead to multiple IRQs being allocated for the
> > same underlying GSI?
> 
> I understand this point and I think it needs investigating.
> 
> 
> > As I commented there, I think that approach is wrong.  If the GSI has
> > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > IO-APIC pin) we should add some logic in the toolstack to map it
> > before attempting to bind.
> 
> But this statement confuses me. The toolstack doesn't get involved in
> IRQ setup for PCI devices for HVM guests?

It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
cold be removed (maybe for qemu-trad only?) or it's also required by
QEMU upstream, I would have to investigate more.  It's my
understanding it's in pci_add_dm_done() where Ray was getting the
mismatched IRQ vs GSI number.

Thanks, Roger.



[PATCH v2 0/2] Fixing error_interrupt()'s messages v2

2023-03-17 Thread Elliott Mitchell
Using a somewhat distinct strategy with the printk().  This is two
patches, since nominally the first can work with other strategies.  I
could see them being squashed during commit to the main repository.

Elliott Mitchell (2):
  x86/APIC: include full string with error_interrupt() error messages
  x86/APIC: modify error_interrupt() to output using single printk()

 xen/arch/x86/apic.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (|   ehem+sig...@drgnwing.com PGP 87145445   |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445






[PATCH v2 1/2] x86/APIC: include full string with error_interrupt() error messages

2023-03-17 Thread Elliott Mitchell
Rather than adding ", " with each printf(), simply include them in the
string initially.

Signed-off-by: Elliott Mitchell 
---
 xen/arch/x86/apic.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index f71474d47d..8cfb8cd71c 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1401,14 +1401,14 @@ static void cf_check spurious_interrupt(struct 
cpu_user_regs *regs)
 static void cf_check error_interrupt(struct cpu_user_regs *regs)
 {
 static const char *const esr_fields[] = {
-"Send CS error",
-"Receive CS error",
-"Send accept error",
-"Receive accept error",
-"Redirectable IPI",
-"Send illegal vector",
-"Received illegal vector",
-"Illegal register address",
+", Send CS error",
+", Receive CS error",
+", Send accept error",
+", Receive accept error",
+", Redirectable IPI",
+", Send illegal vector",
+", Received illegal vector",
+", Illegal register address",
 };
 unsigned int v, v1;
 int i;
@@ -1423,7 +1423,7 @@ static void cf_check error_interrupt(struct cpu_user_regs 
*regs)
 smp_processor_id(), v , v1);
 for ( i = 7; i >= 0; --i )
 if ( v1 & (1 << i) )
-printk(", %s", esr_fields[i]);
+printk("%s", esr_fields[i]);
 printk("\n");
 }
 
-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (|   ehem+sig...@drgnwing.com PGP 87145445   |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445






[PATCH v2 2/2] x86/APIC: modify error_interrupt() to output using single printk()

2023-03-17 Thread Elliott Mitchell
This takes care of the issue of APIC errors tending to occur on multiple
cores at one.  In turn this tends to causes the error messages to be
merged together, making understanding them difficult.

Signed-off-by: Elliott Mitchell 
---
 xen/arch/x86/apic.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 8cfb8cd71c..89abd1ea51 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1410,6 +1410,7 @@ static void cf_check error_interrupt(struct cpu_user_regs 
*regs)
 ", Received illegal vector",
 ", Illegal register address",
 };
+const char *entries[ARRAY_SIZE(esr_fields)];
 unsigned int v, v1;
 int i;
 
@@ -1419,12 +1420,12 @@ static void cf_check error_interrupt(struct 
cpu_user_regs *regs)
 v1 = apic_read(APIC_ESR);
 ack_APIC_irq();
 
-printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)",
-smp_processor_id(), v , v1);
 for ( i = 7; i >= 0; --i )
-if ( v1 & (1 << i) )
-printk("%s", esr_fields[i]);
-printk("\n");
+entries[i] = v1 & (1 << i) ? esr_fields[i] : "";
+printk(XENLOG_DEBUG "APIC error on CPU%u: %02x(%02x)"
+"%s%s%s%s%s%s%s%s" "\n",
+smp_processor_id(), v , v1, entries[0], entries[1], entries[2],
+entries[3], entries[4], entries[5], entries[6], entries[7]);
 }
 
 /*
-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (|   ehem+sig...@drgnwing.com PGP 87145445   |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445






Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-03-17 Thread Stefano Stabellini
On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > >> So yes, it then all boils down to that Linux-
> > > > >> internal question.
> > > > > 
> > > > > Excellent question but we'll have to wait for Ray as he is the one 
> > > > > with
> > > > > access to the hardware. But I have this data I can share in the
> > > > > meantime:
> > > > > 
> > > > > [1.260378] IRQ to pin mappings:
> > > > > [1.260387] IRQ1 -> 0:1
> > > > > [1.260395] IRQ2 -> 0:2
> > > > > [1.260403] IRQ3 -> 0:3
> > > > > [1.260410] IRQ4 -> 0:4
> > > > > [1.260418] IRQ5 -> 0:5
> > > > > [1.260425] IRQ6 -> 0:6
> > > > > [1.260432] IRQ7 -> 0:7
> > > > > [1.260440] IRQ8 -> 0:8
> > > > > [1.260447] IRQ9 -> 0:9
> > > > > [1.260455] IRQ10 -> 0:10
> > > > > [1.260462] IRQ11 -> 0:11
> > > > > [1.260470] IRQ12 -> 0:12
> > > > > [1.260478] IRQ13 -> 0:13
> > > > > [1.260485] IRQ14 -> 0:14
> > > > > [1.260493] IRQ15 -> 0:15
> > > > > [1.260505] IRQ106 -> 1:8
> > > > > [1.260513] IRQ112 -> 1:4
> > > > > [1.260521] IRQ116 -> 1:13
> > > > > [1.260529] IRQ117 -> 1:14
> > > > > [1.260537] IRQ118 -> 1:15
> > > > > [1.260544]  done.
> > > > 
> > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > Linux running baremetal on the same hardware?
> > > 
> > > So I have some emails from Ray from he time he was looking into this,
> > > and on Linux dom0 PVH dmesg there is:
> > > 
> > > [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 
> > > 0-23
> > > [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 
> > > 24-55
> > > 
> > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > consistent.
> > >  
> > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ 
> > > > > ==
> > > > > 112 (which is the one causing issues):
> > > > > 
> > > > > __acpi_register_gsi->
> > > > > acpi_register_gsi_ioapic->
> > > > > mp_map_gsi_to_irq->
> > > > > mp_map_pin_to_irq->
> > > > > __irq_resolve_mapping()
> > > > > 
> > > > > if (likely(data)) {
> > > > > desc = irq_data_to_desc(data);
> > > > > if (irq)
> > > > > *irq = data->irq;
> > > > > /* this IRQ is 112, IO-APIC-34 domain */
> > > > > }
> > > 
> > > 
> > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > 
> > > FWIW, the introduced function in that patch
> > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > registered, which might lead to multiple IRQs being allocated for the
> > > same underlying GSI?
> > 
> > I understand this point and I think it needs investigating.
> > 
> > 
> > > As I commented there, I think that approach is wrong.  If the GSI has
> > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > before attempting to bind.
> > 
> > But this statement confuses me. The toolstack doesn't get involved in
> > IRQ setup for PCI devices for HVM guests?
> 
> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> cold be removed (maybe for qemu-trad only?) or it's also required by
> QEMU upstream, I would have to investigate more.

You are right. I am not certain, but it seems like a mistake in the
toolstack to me. In theory, pci_add_dm_done should only be needed for PV
guests, not for HVM guests. I am not sure. But I can see the call to
xc_physdev_map_pirq you were referring to now.


> It's my understanding it's in pci_add_dm_done() where Ray was getting
> the mismatched IRQ vs GSI number.

I think the mismatch was actually caused by the xc_physdev_map_pirq call
from QEMU, which makes sense because in any case it should happen before
the same call done by pci_add_dm_done (pci_add_dm_done is called after
sending the pci passthrough QMP command to QEMU). So the first to hit
the IRQ!=GSI problem would be QEMU.

[qemu-mainline test] 179708: regressions - trouble: fail/pass/starved

2023-03-17 Thread osstest service owner
flight 179708 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179708/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt  14 guest-start  fail REGR. vs. 179518
 test-amd64-i386-libvirt-xsm  14 guest-start  fail REGR. vs. 179518
 test-amd64-i386-libvirt-pair 25 guest-start/debian   fail REGR. vs. 179518
 test-amd64-amd64-xl-qcow212 debian-di-installfail REGR. vs. 179518
 test-amd64-amd64-libvirt-vhd 12 debian-di-installfail REGR. vs. 179518
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 179518
 test-arm64-arm64-libvirt-xsm 14 guest-start  fail REGR. vs. 179518
 test-amd64-i386-libvirt-raw  12 debian-di-installfail REGR. vs. 179518
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 179518
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail REGR. vs. 179518
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail REGR. vs. 179518
 test-arm64-arm64-libvirt-raw 12 debian-di-installfail REGR. vs. 179518
 test-amd64-i386-xl-vhd   12 debian-di-installfail REGR. vs. 179518
 test-arm64-arm64-xl-vhd  12 debian-di-installfail REGR. vs. 179518
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 179518

Tests which are failing intermittently (not blocking):
 test-amd64-i386-pair 10 xen-install/src_host fail in 179682 pass in 179708
 test-amd64-i386-pair 11 xen-install/dst_host fail in 179682 pass in 179708
 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-install fail in 179682 pass in 179708
 test-amd64-i386-freebsd10-i386  7 xen-install  fail pass in 179682

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179518
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179518
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179518
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179518
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179518
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   starved  n/a
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 qemuu9636e513255362c4a329e3e5fb2c97dab3c5ce47
baseline version:
 qemuu7b0f0aa55fd292fa3489755a3a896e496c51ea86

Last test of basis   179518  2023-03-09 10:37:19 Z8 days
Failing since179526  2023-03-10 01:53:40 Z7 days   13 attempts
Testing same since   179682  2023-03-16 13:21:44 Z1 days2 attempts


People who touched revisions under test:
  Akihiko Odaki 
  Albert Esteve 
  Alex Bennée 
  Alex Williamson 
  Alistair Francis 
  Andreas Schwab 
  Anton Johansson 
  Avihai Horon 
  BALATON Zoltan 
  Bernhard Beschow 
  Carlos López 
  Cédric Le Goater 
  Cédric Le Goater 
  Damien Hedde 
  Daniel P. Berran

Re: [PATCH 1/2] automation: build 6.1.19 kernel for x86-64 dom0

2023-03-17 Thread Stefano Stabellini
On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> It will be used in tests added in subsequent patches.
> Enable config options needed for those tests.

Thanks for the patch! Looks great.

Can you also remove the old 5.10.74 Dockerfile
(automation/tests-artifacts/kernel/5.10.74.dockerfile), the
kernel-5.10.74-export job as well and replace kernel-5.10.74-export with
kernel-6.1.19-export as dependency for the qemu-alpine-x86_64-gcc test
job?

I think it makes sense to just keep kernel-6.1.19-export. I tested the
below already and it works fine.

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 2e1a6886df..f28c01fe0e 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -269,7 +269,7 @@ qemu-alpine-x86_64-gcc:
   needs:
 - alpine-3.12-gcc
 - alpine-3.12-rootfs-export
-- kernel-5.10.74-export
+- kernel-6.1.19-export



> Signed-off-by: Marek Marczykowski-Górecki 
> ---
>  automation/gitlab-ci/build.yaml | 11 -
>  automation/tests-artifacts/kernel/6.1.19.dockerfile | 40 ++-
>  2 files changed, 51 insertions(+)
>  create mode 100644 automation/tests-artifacts/kernel/6.1.19.dockerfile
> 
> diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
> index 38bb22d8609b..e1799d454c76 100644
> --- a/automation/gitlab-ci/build.yaml
> +++ b/automation/gitlab-ci/build.yaml
> @@ -790,3 +790,14 @@ kernel-5.10.74-export:
>- binaries/bzImage
>tags:
>  - x86_64
> +
> +kernel-6.1.19-export:
> +  extends: .test-jobs-artifact-common
> +  image: registry.gitlab.com/xen-project/xen/tests-artifacts/kernel:6.1.19
> +  script:
> +- mkdir binaries && cp /bzImage binaries/bzImage
> +  artifacts:
> +paths:
> +  - binaries/bzImage
> +  tags:
> +- x86_64
> diff --git a/automation/tests-artifacts/kernel/6.1.19.dockerfile 
> b/automation/tests-artifacts/kernel/6.1.19.dockerfile
> new file mode 100644
> index ..c2171555a0a6
> --- /dev/null
> +++ b/automation/tests-artifacts/kernel/6.1.19.dockerfile
> @@ -0,0 +1,40 @@
> +FROM debian:unstable
> +LABEL maintainer.name="The Xen Project" \
> +  maintainer.email="xen-devel@lists.xenproject.org"
> +
> +ENV DEBIAN_FRONTEND=noninteractive
> +ENV LINUX_VERSION=6.1.19
> +ENV USER root
> +
> +RUN mkdir /build
> +WORKDIR /build
> +
> +# build depends
> +RUN apt-get update && \
> +apt-get --quiet --yes install \
> +build-essential \
> +libssl-dev \
> +bc \
> +curl \
> +flex \
> +bison \
> +libelf-dev \
> +&& \
> +apt-get autoremove -y && \
> +apt-get clean && \
> +rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
> +
> +# Build the kernel
> +RUN curl -fsSLO 
> https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-"$LINUX_VERSION".tar.xz && 
> \
> +tar xvJf linux-"$LINUX_VERSION".tar.xz && \
> +cd linux-"$LINUX_VERSION" && \
> +make defconfig && \
> +make xen.config && \
> +scripts/config --enable BRIDGE && \
> +scripts/config --enable IGC && \
> +cp .config .config.orig && \
> +cat .config.orig | grep XEN | grep =m |sed 's/=m/=y/g' >> .config && \
> +make -j$(nproc) bzImage && \
> +cp arch/x86/boot/bzImage / && \
> +cd /build && \
> +rm -rf linux-"$LINUX_VERSION"*
> -- 
> git-series 0.9.1
> 

[ovmf test] 179730: all pass - PUSHED

2023-03-17 Thread osstest service owner
flight 179730 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179730/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf b17a3a133b18fb41493fba7d86e9b5804ea6a8cf
baseline version:
 ovmf 410ca0ff94a42ee541dd6ceab70ea974eeb7e500

Last test of basis   179713  2023-03-17 08:15:15 Z0 days
Testing same since   179730  2023-03-17 18:22:13 Z0 days1 attempts


People who touched revisions under test:
  Rebecca Cran 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   410ca0ff94..b17a3a133b  b17a3a133b18fb41493fba7d86e9b5804ea6a8cf -> 
xen-tested-master



Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system

2023-03-17 Thread Stefano Stabellini
On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> This is a first test using Qubes OS CI infra. The gitlab-runner has
> access to ssh-based control interface (control@thor.testnet, ssh key
> exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> files (mapped under /scratch/gitlab-runner/tftp inside the container).
> Details about the setup are described on
> https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> 
> This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> system. The test script is based on the Xilinx one.
> 
> The machine needs newer kernel than other x86 tests run, so use 6.1.x
> kernel added in previous commit.
> 
> When building rootfs, use fakeroot to preserve device files when
> repacking rootfs archives in a non-privileged container.
> 
> Signed-off-by: Marek Marczykowski-Górecki 

This is awesome!! Thanks Marek!


> ---
> I'm bad at naming things. Things that I could use naming suggestions:
>  - test script (qubes-x86-64-suspend.sh) - this might be okay?
>  - test template job name (.adl-x86-64)
>  - test job name (adl-suspend-x86-64-gcc)
>  - tag (qubes-hw2)

I think these names are OK. I would maybe rename the tag "qubes-hw2" to
"qubes" because so far there is only one but I am fine with qubes-hw2
also.


> For context, our CI has several machines, named test-X or hwX, each
> controlled with matching hal900X RPi (this is where gitlab-runner is).
> This test uses only one specific hw, but I plan adding few others too.
> ---
>  automation/gitlab-ci/test.yaml |  31 -
>  automation/scripts/qubes-x86-64-suspend.sh | 155 ++-
>  2 files changed, 186 insertions(+)
>  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> 
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index 2e1a6886df7f..f5511dd6da70 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -15,6 +15,10 @@
>  .arm32-test-needs: &arm32-test-needs
>- qemu-system-aarch64-6.0.0-arm32-export
>  
> +.x86-64-test-needs: &x86-64-test-needs
> +  - alpine-3.12-rootfs-export
> +  - kernel-6.1.19-export

As you know, the jobs starting with a "." are template jobs to avoid
repeating the same things over and over. .x86-64-test-needs could be
used in qemu-alpine-x86_64-gcc also.


>  .qemu-arm64:
>extends: .test-jobs-common
>variables:
> @@ -84,6 +88,25 @@
>tags:
>  - xilinx
>  
> +.adl-x86-64:
> +  extends: .test-jobs-common
> +  variables:
> +# the test controller runs on RPi4
> +CONTAINER: alpine:3.12-arm64v8
> +LOGFILE: smoke-test.log
> +  artifacts:
> +paths:
> +  - smoke.serial
> +  - '*.log'
> +when: always
> +  only:
> +variables:
> +  - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"

Let me know which trees should have QUBES_JOBS set to true (thus able to
start Qubes jobs.) At a minimum, I think we would want
https://gitlab.com/xen-project/xen to test "staging" and "master". I can
set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
OK with it.


> +  before_script:
> +- apk --no-cache add openssh-client fakeroot

This would work but we typically try to avoid installing more packages
in the test jobs for speed and efficiency. Instead, add those packages
directly to the build container dockerfile
(automation/build/alpine/3.12-arm64v8.dockerfile), e.g.:

diff --git a/automation/build/alpine/3.12-arm64v8.dockerfile 
b/automation/build/alpine/3.12-arm64v8.dockerfile
index 180c978964..3f1e6a3fc6 100644
--- a/automation/build/alpine/3.12-arm64v8.dockerfile
+++ b/automation/build/alpine/3.12-arm64v8.dockerfile
@@ -41,3 +41,6 @@ RUN apk --no-cache add \
   libattr \
   libcap-ng-dev \
   pixman-dev \
+  # qubes test deps
+  openssh-client \
+  fakeroot \

But I am not sure why you need fakeroot, more questions below about it


> +  tags:
> +- qubes-hw2
> +
>  # Test jobs
>  build-each-commit-gcc:
>extends: .test-jobs-common
> @@ -110,6 +133,14 @@ xilinx-smoke-dom0less-arm64-gcc:
>  - *arm64-test-needs
>  - alpine-3.12-gcc-arm64
>  
> +adl-suspend-x86-64-gcc:
> +  extends: .adl-x86-64
> +  script:
> +- ./automation/scripts/qubes-x86-64-suspend.sh s3 2>&1 | tee ${LOGFILE}
> +  needs:
> +- *x86-64-test-needs
> +- alpine-3.12-gcc

This up to you, but qubes-x86-64-suspend.sh is very cool and
sophisticated and I would use it to create at least 2 jobs:
- a plain dom0 + domU boot (no suspend)
- a suspend/resume job

I am happy either way


>  qemu-smoke-dom0-arm64-gcc:
>extends: .qemu-arm64
>script:
> diff --git a/automation/scripts/qubes-x86-64-suspend.sh 
> b/automation/scripts/qubes-x86-64-suspend.sh
> new file mode 100755
> index ..fc479c9faaec
> --- /dev/null
> +++ b/automation/scripts/qubes-x86-64-suspend.sh
> @@ -0,0 +1,155 @@
> +#!/bin/sh
> +
> +set -ex
> +
> +test_variant=$1
> +
> +wait_and_wakeup=
> +if [ -z "${te

Re: [RFC QEMU PATCH 12/18] softmmu: Fix the size to map cache with xen for host virtual address

2023-03-17 Thread Stefano Stabellini
On Sun, 12 Mar 2023, Huang Rui wrote:
> The xen_map_cache function wants to pass offset and size of this memory
> block as the input parameters to map the host virtual address. However,
> block->offset is too large as 0x1 (4G), if we assign the size as
> block->max_length (0x11000), the mapped host address will be out of
> block->max_length and easy to overflow.

Hi Ray,

Is this patch still required after all the other fixes?

If it is required, where is the overflow that it is trying to prevent?
Is it a failure in the hypercall mapping the memory to QEMU
(hw/i386/xen/xen-mapcache.c:xen_remap_bucket) ?


> We have to assign the size as
> (block->max_length - block->offset), then that is able to ensure the
> address will be located in legal range inside of max_length.
>
> {rcu = {next = 0x0, func = 0x0}, mr = 0x5681b620, host = 0x0,
> colo_cache = 0x0, offset = 0x1, used_length = 0x11000,
> max_length = 0x11000, resized = 0x0, flags = 0x10, idstr = {0x78,
> 0x65, 0x6e, 0x2e, 0x72, 0x61, 0x6d, 0x0 }, next = {
> le_next = 0x568c61b0, le_prev = 0x5681c640},
> ramblock_notifiers = {lh_first = 0x0}, fd = 0x, page_size =
> 0x1000, bmap = 0x0, receivedmap = 0x0, clear_bmap = 0x0,
> clear_bmap_shift = 0x0, postcopy_length = 0x0}
> 
> Signed-off-by: Huang Rui 
> ---
>  softmmu/physmem.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 1b606a3002..1b0bb35da9 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2304,7 +2304,7 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
> addr)
>  return xen_map_cache(addr, 0, 0, false);
>  }
>  
> -block->host = xen_map_cache(block->offset, block->max_length, 1, 
> false);
> + block->host = xen_map_cache(block->offset, block->max_length, 1, false);

Coding style: indentation is 4 spaces. In any case, this looks like a
spurious change?


>  }
>  return ramblock_ptr(block, addr);
>  }
> @@ -2337,7 +2337,8 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
> ram_addr_t addr,
>  return xen_map_cache(addr, *size, lock, lock);
>  }
>  
> -block->host = xen_map_cache(block->offset, block->max_length, 1, 
> lock);
> + block->host = xen_map_cache(block->offset,
> + block->max_length - block->offset, 1, lock);
>  }
>  return ramblock_ptr(block, addr);


block->offset is the address of the beginning of the block, and
block->max_length is the size. Here the behavior is theoretically
correct: if block->host is not set (not mapped in QEMU yet), then call
xen_map_cache to map the entire block from beginning to end, setting
block->host with a pointer to the beginning of the mapped area in QEMU.
>From that point onward, ramblock_ptr() will then behave correctly.

Of course if xen_map_cache fails to map the entire region at once
because it is too large or other error, then we have a big problem.

But I think in that case this patch would still cause issues. In this
example offset (start of the ramblock) is 0x1, and max_length
(size of the ramblock) is 0x11000. So with this change we are
mapping 0x11000-0x1 = 0x1000 which is only the first
256MB of the region which is more than 4GB.

What happens the next time qemu_ram_ptr_length is called for an address
above the first 256MB? It will break because block->host != NULL so the
function will behave as if the entire ramblock is mapped in QEMU while
it is not (only the first 256MB are). ramblock_ptr will return
block->host + something-more-than-256MB which is actually invalid.


I think we would need more something along this line where we fall back
to temporary mappings of a smaller region if we can't map it all at once.
MAX_SIZE would be the max size where a single mapping still succeeds,
maybe 4GB?

if (block->offset == 0 || block->max_length > MAX_SIZE) {
return xen_map_cache(addr, *size, lock, lock);
}


Otherwise, maybe the error could be due to max_length being incorrect to
begin with?



Re: [RFC QEMU PATCH 13/18] hw/i386/xen/xen-hvm: Introduce xen_ram_block_check function

2023-03-17 Thread Stefano Stabellini
On Sun, 12 Mar 2023, Huang Rui wrote:
> Introduce xen_ram_block_check function to check whether current ramblock
> is xen ram memory.
> 
> Signed-off-by: Huang Rui 
> ---
>  hw/i386/xen/xen-hvm.c | 15 +++
>  include/hw/xen/xen.h  |  1 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index e4293d6d66..a4f12aefce 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -32,6 +32,7 @@
>  #include "sysemu/xen.h"
>  #include "sysemu/xen-mapcache.h"
>  #include "trace.h"
> +#include "include/exec/ramblock.h"
>  
>  #include 
>  #include 
> @@ -1564,6 +1565,20 @@ void xen_register_framebuffer(MemoryRegion *mr)
>  framebuffer = mr;
>  }
>  
> +bool xen_ram_block_check(RAMBlock *rb)
> +{
> + bool ret;
> +
> + if (!rb)
> + return false;
> +
> + ret = (rb == ram_memory.ram_block);
> + if (ret)
> + rb->offset = 0;

I take that this is needed because there is a ramblock that is
ram_memory but with offset != 0?  So it would fail the block->offset ==
0 check in qemu_ram_ptr_length (which is meant to capture all accesses
to ram_memory, but failing at it)?

If so, would it be possible to just do this instead:


diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index fb412a56e1..3e2640dabd 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2149,7 +2149,7 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
  * because we don't want to map the entire memory in QEMU.
  * In that case just map the requested area.
  */
-if (block->offset == 0) {
+if (block->offset == 0 || block == ram_memory.ram_block) {
 return xen_map_cache(addr, *size, lock, lock);
 }
 


> + return ret;
> +}
> +
>  void xen_shutdown_fatal_error(const char *fmt, ...)
>  {
>  va_list ap;
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index afdf9c436a..99a383eb17 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -31,5 +31,6 @@ qemu_irq *xen_interrupt_controller_init(void);
>  void xenstore_store_pv_console_info(int i, Chardev *chr);
>  
>  void xen_register_framebuffer(struct MemoryRegion *mr);
> +bool xen_ram_block_check(RAMBlock *rb);
>  
>  #endif /* QEMU_HW_XEN_H */
> -- 
> 2.25.1
> 



Re: [PATCH 2/2] automation: add a suspend test on an Alder Lake system

2023-03-17 Thread Marek Marczykowski-Górecki
On Fri, Mar 17, 2023 at 04:10:22PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Marek Marczykowski-Górecki wrote:
> > This is a first test using Qubes OS CI infra. The gitlab-runner has
> > access to ssh-based control interface (control@thor.testnet, ssh key
> > exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
> > files (mapped under /scratch/gitlab-runner/tftp inside the container).
> > Details about the setup are described on
> > https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
> > 
> > This test boots Xen, and try if S3 works. It runs on a ADL-based desktop
> > system. The test script is based on the Xilinx one.
> > 
> > The machine needs newer kernel than other x86 tests run, so use 6.1.x
> > kernel added in previous commit.
> > 
> > When building rootfs, use fakeroot to preserve device files when
> > repacking rootfs archives in a non-privileged container.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> 
> This is awesome!! Thanks Marek!
> 
> 
> > ---
> > I'm bad at naming things. Things that I could use naming suggestions:
> >  - test script (qubes-x86-64-suspend.sh) - this might be okay?
> >  - test template job name (.adl-x86-64)
> >  - test job name (adl-suspend-x86-64-gcc)
> >  - tag (qubes-hw2)
> 
> I think these names are OK. I would maybe rename the tag "qubes-hw2" to
> "qubes" because so far there is only one but I am fine with qubes-hw2
> also.

I have 10 of them (and growing), so I'd like to keep tag name at least
somehow referencing which runner it uses. For example, this one is
a desktop with Alder Lake, but some other tests I may want to use a laptop
with Tiger Lake, for example. The mapping name -> hw spec isn't publicly
written down (although our openQA publishes all kind of logs from them,
so it's possible to infer this info).

> > For context, our CI has several machines, named test-X or hwX, each
> > controlled with matching hal900X RPi (this is where gitlab-runner is).
> > This test uses only one specific hw, but I plan adding few others too.
> > ---
> >  automation/gitlab-ci/test.yaml |  31 -
> >  automation/scripts/qubes-x86-64-suspend.sh | 155 ++-
> >  2 files changed, 186 insertions(+)
> >  create mode 100755 automation/scripts/qubes-x86-64-suspend.sh
> > 
> > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> > index 2e1a6886df7f..f5511dd6da70 100644
> > --- a/automation/gitlab-ci/test.yaml
> > +++ b/automation/gitlab-ci/test.yaml
> > @@ -15,6 +15,10 @@
> >  .arm32-test-needs: &arm32-test-needs
> >- qemu-system-aarch64-6.0.0-arm32-export
> >  
> > +.x86-64-test-needs: &x86-64-test-needs
> > +  - alpine-3.12-rootfs-export
> > +  - kernel-6.1.19-export
> 
> As you know, the jobs starting with a "." are template jobs to avoid
> repeating the same things over and over. .x86-64-test-needs could be
> used in qemu-alpine-x86_64-gcc also.

Right, when switching all of them to 6.1 kernel, that makes sense.

> >  .qemu-arm64:
> >extends: .test-jobs-common
> >variables:
> > @@ -84,6 +88,25 @@
> >tags:
> >  - xilinx
> >  
> > +.adl-x86-64:
> > +  extends: .test-jobs-common
> > +  variables:
> > +# the test controller runs on RPi4
> > +CONTAINER: alpine:3.12-arm64v8
> > +LOGFILE: smoke-test.log
> > +  artifacts:
> > +paths:
> > +  - smoke.serial
> > +  - '*.log'
> > +when: always
> > +  only:
> > +variables:
> > +  - $QUBES_JOBS == "true" && $CI_COMMIT_REF_PROTECTED == "true"
> 
> Let me know which trees should have QUBES_JOBS set to true (thus able to
> start Qubes jobs.) At a minimum, I think we would want
> https://gitlab.com/xen-project/xen to test "staging" and "master". I can
> set QUBES_JOBS to true to https://gitlab.com/xen-project/xen if you are
> OK with it.

Yes, that's perfectly okay. I'd like at least also staging-4.17, but
depending on push frequency other staging-* are probably fine too (I
very much doubt long queue would be an issue). Of course that assumes
those tests would be backported, which I'm not sure if is planned. I'm
also okay with allowing committers and/or other maintainers to use it on
demand, but preferably not all patchew branches.

BTW, if you trigger job manually via a web interface or API, you can
specify variables for just a single pipeline. And if you use that
feature, you can also make gitlab present you a bit more convenient
interface with variables listed already. See example here:
https://gitlab.com/QubesOS/qubes-continuous-integration/-/blob/main/.gitlab-ci.yml#L15-26
This could be useful to start only a subset of tests.

> > +  before_script:
> > +- apk --no-cache add openssh-client fakeroot
> 
> This would work but we typically try to avoid installing more packages
> in the test jobs for speed and efficiency. 

Yes, I was being lazy and wanted to avoid rebuilding container (and
changing from where it's pulled) when developing the test. Sure, will
fold t

[xen-unstable test] 179717: tolerable trouble: fail/pass/starved

2023-03-17 Thread osstest service owner
flight 179717 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179717/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail in 179697 pass in 
179717
 test-amd64-i386-examine-uefi  6 xen-installfail pass in 179697
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-install  fail pass in 179697
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat  fail pass in 179697

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179697
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179697
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179697
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179697
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179697
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179697
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179697
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179697
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179697
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-armhf-armhf-examine  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   starved  n/a
 build-armhf   2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  36e49fc8cbad21a4308b4701caaa685ef04e120b
baseline version:
 xen  36e49fc8cbad21a4308b4701caaa685ef04e120b

Last test of basis   179717  2023-03-17 11:14:24 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64 

[linux-5.4 test] 179725: regressions - trouble: fail/pass/starved

2023-03-17 Thread osstest service owner
flight 179725 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/179725/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 179598

Tests which are failing intermittently (not blocking):
 test-amd64-i386-pair 11 xen-install/dst_host   fail pass in 179711
 test-amd64-i386-xl-vhd   21 guest-start/debian.repeat  fail pass in 179711

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 179598
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 179598
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 179598
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 179598
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 179598
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 179598
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 179598
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 179598
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 179598
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw  1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-examine  1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   starved in 179711 n/a
 test-arm64-arm64-xl   1 build-check(1)   starved in 179711 n/a
 build-arm64-pvops 2 hosts-allocate   starved in 179711 n/a
 build-armhf-libvirt   1 build-check(1)   starved  n/a
 test-armhf-armhf-examine  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   starved  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   starved  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   starved  n/a
 b