date:20220308

Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monné

On Mon, Mar 07, 2022 at 06:07:00PM +0100, Jan Beulich wrote:
> On 07.03.2022 17:36, Roger Pau Monné wrote:
> > On Mon, Mar 07, 2022 at 05:28:20PM +0100, Jan Beulich wrote:
> >> On 07.03.2022 16:55, Roger Pau Monne wrote:
> >>> If livepatching support is enabled build the hypervisor with
> >>> -f{function,data}-sections compiler options, which is required by the
> >>> livepatching tools to detect changes and create livepatches.
> >>>
> >>> This shouldn't result in any functional change on the hypervisor
> >>> binary image, but does however require some changes in the linker
> >>> script in order to handle that each function and data item will now be
> >>> placed into its own section in object files. As a result add catch-all
> >>> for .text, .data and .bss in order to merge each individual item
> >>> section into the final image.
> >>>
> >>> The main difference will be that .text.startup will end up being part
> >>> of .text rather than .init, and thus won't be freed. .text.exit will
> >>> also be part of .text rather than dropped. Overall this could make the
> >>> image bigger, and package some .text code in a sub-optimal way.
> >>>
> >>> Note that placement of the sections inside of .text is also slightly
> >>> adjusted to be more similar to the position found in the default GNU
> >>> ld linker script. This requires having a separate section for the
> >>> header in order to place it at the begging of the output image,
> >>> followed with the unlikely and related sections, and finally the main
> >>> .text section.
> >>>
> >>> On Arm the .data.read_mostly needs to be moved ahead of the .data
> >>> section like it's already done on x86, and the alignment needs to be
> >>> set to PAGE_SIZE so it doesn't end up being placed at the tail of a
> >>> read-only page from the previous section. While there move the
> >>> alignment of the .data section ahead of the section declaration, like
> >>> it's done for other sections.
> >>>
> >>> The benefit of having CONFIG_LIVEPATCH enable those compiler option
> >>> is that the livepatch build tools no longer need to fiddle with the
> >>> build system in order to enable them. Note the current livepatch tools
> >>> are broken after the recent build changes due to the way they
> >>> attempt to set  -f{function,data}-sections.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> The x86 part of this looks fine to me, just one other remark:
> >>
> >>> --- a/xen/common/Kconfig
> >>> +++ b/xen/common/Kconfig
> >>> @@ -350,10 +350,14 @@ source "common/sched/Kconfig"
> >>>  config CRYPTO
> >>>   bool
> >>>  
> >>> +config CC_SPLIT_SECTIONS
> >>> + bool
> >>
> >> I think this wants to live higher up in the file, perhaps between
> >> ALTERNATIVE_CALL and HAS_ALTERNATIVE. (CRYPTO, as seen in context
> >> here, imo also would better live higher up.) Or alternatively it may
> >> want to move to xen/Kconfig, next to CC_HAS_VISIBILITY_ATTRIBUTE.
> > 
> > I was tempted to place it in xen/Kconfig. The logic for the current
> > suggested placement is to be closer to it's current only user
> > (LIVEPATCH), but I'm not opposed to moving it somewhere else if
> > there's consensus.
> 
> I guess the main question is whether this option is intended to gain
> a prompt. If so, xen/common/Kconfig is likely the better place. If
> not, I think it better fits in xen/Kconfig.

I think it's unlikely for it to gain a prompt, other options selecting
it is what I would expect.

Will move to xen/Kconfig unless someone objects.

Thanks, Roger.

Re: [PATCH v2 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Jan Beulich

On 07.03.2022 22:13, Bjoern Doebel wrote:
> @@ -159,7 +200,11 @@ void noinline arch_livepatch_apply(struct livepatch_func 
> *func)
>   */
>  void noinline arch_livepatch_revert(const struct livepatch_func *func)
>  {
> -memcpy(func->old_addr, func->opaque, livepatch_insn_len(func));
> +struct x86_livepatch_meta *lp;

Repeating my comment here a 3rd time (sorry): const please (and
generally wherever possible).

Jan

> +lp = (struct x86_livepatch_meta *)func->opaque;
> +
> +memcpy(func->old_addr + lp->patch_offset, lp->instruction, 
> livepatch_insn_len(func));
>  }
>  
>  /*

Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monné

On Mon, Mar 07, 2022 at 05:19:53PM +, Julien Grall wrote:
> Hi Roger,
> 
> On 07/03/2022 15:55, Roger Pau Monne wrote:
> > If livepatching support is enabled build the hypervisor with
> > -f{function,data}-sections compiler options, which is required by the
> > livepatching tools to detect changes and create livepatches.
> > 
> > This shouldn't result in any functional change on the hypervisor
> > binary image, but does however require some changes in the linker
> > script in order to handle that each function and data item will now be
> > placed into its own section in object files. As a result add catch-all
> > for .text, .data and .bss in order to merge each individual item
> > section into the final image.
> > 
> > The main difference will be that .text.startup will end up being part
> > of .text rather than .init, and thus won't be freed. .text.exit will
> > also be part of .text rather than dropped. Overall this could make the
> > image bigger, and package some .text code in a sub-optimal way.
> > 
> > Note that placement of the sections inside of .text is also slightly
> > adjusted to be more similar to the position found in the default GNU
> > ld linker script. This requires having a separate section for the
> > header in order to place it at the begging of the output image,
> > followed with the unlikely and related sections, and finally the main
> > .text section.
> > 
> > On Arm the .data.read_mostly needs to be moved ahead of the .data
> > section like it's already done on x86, and the alignment needs to be
> > set to PAGE_SIZE so it doesn't end up being placed at the tail of a
> > read-only page from the previous section. While there move the
> > alignment of the .data section ahead of the section declaration, like
> > it's done for other sections.
> 
> This sounds like a bug not related to this patch. Can this be split
> separately?

No, .data.read_mostly needs to be moved before .data so that the catch
all .data.* introduced to .data to account for -fdata-sections doesn't
end up also including .data.read_mostly.

> > 
> > The benefit of having CONFIG_LIVEPATCH enable those compiler option
> > is that the livepatch build tools no longer need to fiddle with the
> > build system in order to enable them. Note the current livepatch tools
> > are broken after the recent build changes due to the way they
> > attempt to set  -f{function,data}-sections.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v1:
> >   - Introduce CC_SPLIT_SECTIONS for selecting the compiler options.
> >   - Drop check for compiler options, all supported versions have them.
> >   - Re-arrange section placement in .text, to match the default linker
> > script.
> >   - Introduce .text.header to contain the headers bits that must appear
> > first in the final binary.
> > ---
> > Given that now the header is explicitly placed in .text.header, it's
> > likely possible to simplify some of the ordering of the object files
> > for the prelink.o generation, as arch/x86/boot/built_in.o no longer
> > needs to be the first object file in the list.
> > 
> > It also seems on Arm the schedulers and hypfs .data sections should be
> > moved into read_mostly.
> > ---
> > Tested by gitlab in order to assert I didn't introduce any regression
> > on Arm specially.
> > ---
> >   xen/Makefile  |  2 ++
> >   xen/arch/arm/arm32/head.S |  1 +
> >   xen/arch/arm/arm64/head.S |  1 +
> >   xen/arch/arm/xen.lds.S| 49 +--
> >   xen/arch/x86/boot/head.S  |  2 +-
> >   xen/arch/x86/xen.lds.S| 22 +++---
> >   xen/common/Kconfig|  4 
> >   7 files changed, 49 insertions(+), 32 deletions(-)
> > 
> > diff --git a/xen/Makefile b/xen/Makefile
> > index 5c21492d6f..18a4f7e101 100644
> > --- a/xen/Makefile
> > +++ b/xen/Makefile
> > @@ -273,6 +273,8 @@ else
> >   CFLAGS += -fomit-frame-pointer
> >   endif
> > +CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections
> > +
> >   CFLAGS += -nostdinc -fno-builtin -fno-common
> >   CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
> >   $(call cc-option-add,CFLAGS,CC,-Wvla)
> > diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
> > index 7a906167ef..c837d3054c 100644
> > --- a/xen/arch/arm/arm32/head.S
> > +++ b/xen/arch/arm/arm32/head.S
> > @@ -120,6 +120,7 @@
> >   #endif /* !CONFIG_EARLY_PRINTK */
> > +.section .text.header, "ax", %progbits
> >   .arm
> >   /*
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 66d862fc81..e62c48ec1c 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -133,6 +133,7 @@
> >   add \xb, \xb, x20
> >   .endm
> > +.section .text.header, "ax", %progbits
> >   /*.aarch64*/
> >   /*
> > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> > index 08016948ab..836da880c3 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/x

Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active

2022-03-08 Thread Jan Beulich

On 07.03.2022 21:53, Andrew Cooper wrote:
> --- a/xen/arch/x86/machine_kexec.c
> +++ b/xen/arch/x86/machine_kexec.c
> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image)
>   */
>  local_irq_disable();
>  
> +/* Reset CPUID masking and faulting to the host's default. */
> +ctxt_switch_levelling(NULL);
> +
> +/* Disable CET. */
> +if ( read_cr4() & X86_CR4_CET )
> +{
> +wrmsrl(MSR_S_CET, 0);
> +write_cr4(read_cr4() & ~X86_CR4_CET);
> +}
> +
>  /* Now regular interrupts are disabled, we need to reduce the impact
>   * of interrupts not disabled by 'cli'.
>   *

Besides introducing somewhat of a disconnect between the comment in
context here and the earlier local_irq_disable(), is it really
necessary to do both actions with IRQs off?

Jan

Re: [PATCH v2 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Doebel, Bjoern




On 08.03.22 09:07, Jan Beulich wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 07.03.2022 22:13, Bjoern Doebel wrote:

@@ -159,7 +200,11 @@ void noinline arch_livepatch_apply(struct livepatch_func 
*func)
   */
  void noinline arch_livepatch_revert(const struct livepatch_func *func)
  {
-memcpy(func->old_addr, func->opaque, livepatch_insn_len(func));
+struct x86_livepatch_meta *lp;


Repeating my comment here a 3rd time (sorry): const please (and
generally wherever possible).


That's embarrassing... ;) Sorry. Will fix.

Bjoern



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH] x86/kexec: Use noreturn attributes, and drop unreachable code

2022-03-08 Thread Jan Beulich

On 07.03.2022 22:02, Andrew Cooper wrote:
> kexec_reloc() does not return.  Plumbing this property upwards lets us mark
> machine_kexec() and machine_reboot_kexec() noreturn too.  This in turn lets us
> drop some unreachable BUG()/return statements.

I'm certainly fine with the added attributes. I'm less convinced of the
removal of BUG() - I'd prefer to leave to the compiler elimination of
these when found to be dead, just to be on the safe side. And I'm pretty
uncertain about the removal of "return", when it comes to old enough
compilers.

> @@ -403,9 +401,6 @@ static long cf_check kexec_reboot(void *_image)
>  
>  kexec_common_shutdown();
>  machine_reboot_kexec(image);
> -
> -BUG();
> -return 0;
>  }

In principle this function now is also "noreturn", but I think I
vaguely recall some compilers warning about "noreturn" when a function
returns other than void.

Jan

Re: [PATCH v4] vpci/msix: fix PBA accesses

2022-03-08 Thread Jan Beulich

On 07.03.2022 17:37, Roger Pau Monne wrote:
> Map the PBA in order to access it from the MSI-X read and write
> handlers. Note that previously the handlers would pass the physical
> host address into the {read,write}{l,q} handlers, which is wrong as
> those expect a linear address.
> 
> Map the PBA using ioremap when the first access is performed. Note
> that 32bit arches might want to abstract the call to ioremap into a
> vPCI arch handler, so they can use a fixmap range to map the PBA.
> 
> Reported-by: Jan Beulich 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

> Cc: Alex Olson 

I'll wait a little with committing, in the hope for Alex to re-provide
a Tested-by.

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct 
> vpci_msix *msix,
>  return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
>  }
>  
> +static void __iomem *get_pba(struct vpci *vpci)
> +{
> +struct vpci_msix *msix = vpci->msix;
> +/*
> + * PBA will only be unmapped when the device is deassigned, so access it
> + * without holding the vpci lock.
> + */
> +void __iomem *pba = read_atomic(&msix->pba);
> +
> +if ( likely(pba) )
> +return pba;
> +
> +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
> +  vmsix_table_size(vpci, VPCI_MSIX_PBA));
> +if ( !pba )
> +return read_atomic(&msix->pba);
> +
> +spin_lock(&vpci->lock);
> +if ( !msix->pba )
> +{
> +write_atomic(&msix->pba, pba);
> +spin_unlock(&vpci->lock);
> +}
> +else
> +{
> +spin_unlock(&vpci->lock);
> +iounmap(pba);
> +}

TBH I had been hoping for just a single spin_unlock(), but you're
the maintainer of this code ...

Jan

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Jan Beulich

On 08.12.2021 16:55, Juergen Gross wrote:
> In order to avoid indirect function calls on the hypercall path as
> much as possible this series is removing the hypercall function tables
> and is replacing the hypercall handler calls via the function array
> by automatically generated call macros.
> 
> Another by-product of generating the call macros is the automatic
> generating of the hypercall handler prototypes from the same data base
> which is used to generate the macros.
> 
> This has the additional advantage of using type safe calls of the
> handlers and to ensure related handler (e.g. PV and HVM ones) share
> the same prototypes.
> 
> A very brief performance test (parallel build of the Xen hypervisor
> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
> the performance with the patches applied. The test was performed using
> a PV and a PVH guest.
> 
> Changes in V2:
> - new patches 6, 14, 15
> - patch 7: support hypercall priorities for faster code
> - comments addressed
> 
> Changes in V3:
> - patches 1 and 4 removed as already applied
> - comments addressed
> 
> Juergen Gross (13):
>   xen: move do_vcpu_op() to arch specific code
>   xen: harmonize return types of hypercall handlers
>   xen: don't include asm/hypercall.h from C sources
>   xen: include compat/platform.h from hypercall.h
>   xen: generate hypercall interface related code
>   xen: use generated prototypes for hypercall handlers
>   x86/pv-shim: don't modify hypercall table
>   xen/x86: don't use hypercall table for calling compat hypercalls
>   xen/x86: call hypercall handlers via generated macro
>   xen/arm: call hypercall handlers via generated macro
>   xen/x86: add hypercall performance counters for hvm, correct pv
>   xen: drop calls_to_multicall performance counter
>   tools/xenperf: update hypercall names

As it's pretty certain now that parts of this which didn't go in yet will
need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
expecting a v4 instead.

Jan

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Juergen Gross


On 08.03.22 09:34, Jan Beulich wrote:

On 08.12.2021 16:55, Juergen Gross wrote:

In order to avoid indirect function calls on the hypercall path as
much as possible this series is removing the hypercall function tables
and is replacing the hypercall handler calls via the function array
by automatically generated call macros.

Another by-product of generating the call macros is the automatic
generating of the hypercall handler prototypes from the same data base
which is used to generate the macros.

This has the additional advantage of using type safe calls of the
handlers and to ensure related handler (e.g. PV and HVM ones) share
the same prototypes.

A very brief performance test (parallel build of the Xen hypervisor
in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
the performance with the patches applied. The test was performed using
a PV and a PVH guest.

Changes in V2:
- new patches 6, 14, 15
- patch 7: support hypercall priorities for faster code
- comments addressed

Changes in V3:
- patches 1 and 4 removed as already applied
- comments addressed

Juergen Gross (13):
   xen: move do_vcpu_op() to arch specific code
   xen: harmonize return types of hypercall handlers
   xen: don't include asm/hypercall.h from C sources
   xen: include compat/platform.h from hypercall.h
   xen: generate hypercall interface related code
   xen: use generated prototypes for hypercall handlers
   x86/pv-shim: don't modify hypercall table
   xen/x86: don't use hypercall table for calling compat hypercalls
   xen/x86: call hypercall handlers via generated macro
   xen/arm: call hypercall handlers via generated macro
   xen/x86: add hypercall performance counters for hvm, correct pv
   xen: drop calls_to_multicall performance counter
   tools/xenperf: update hypercall names


As it's pretty certain now that parts of this which didn't go in yet will
need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
expecting a v4 instead.


Yes, I was planning to spin that up soon.

The main remaining question is whether we want to switch the return type
of all hypercalls (or at least the ones common to all archs) not
requiring to return 64-bit values to "int", as Julien requested.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Jan Beulich

On 08.03.2022 09:39, Juergen Gross wrote:
> On 08.03.22 09:34, Jan Beulich wrote:
>> On 08.12.2021 16:55, Juergen Gross wrote:
>>> In order to avoid indirect function calls on the hypercall path as
>>> much as possible this series is removing the hypercall function tables
>>> and is replacing the hypercall handler calls via the function array
>>> by automatically generated call macros.
>>>
>>> Another by-product of generating the call macros is the automatic
>>> generating of the hypercall handler prototypes from the same data base
>>> which is used to generate the macros.
>>>
>>> This has the additional advantage of using type safe calls of the
>>> handlers and to ensure related handler (e.g. PV and HVM ones) share
>>> the same prototypes.
>>>
>>> A very brief performance test (parallel build of the Xen hypervisor
>>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
>>> the performance with the patches applied. The test was performed using
>>> a PV and a PVH guest.
>>>
>>> Changes in V2:
>>> - new patches 6, 14, 15
>>> - patch 7: support hypercall priorities for faster code
>>> - comments addressed
>>>
>>> Changes in V3:
>>> - patches 1 and 4 removed as already applied
>>> - comments addressed
>>>
>>> Juergen Gross (13):
>>>xen: move do_vcpu_op() to arch specific code
>>>xen: harmonize return types of hypercall handlers
>>>xen: don't include asm/hypercall.h from C sources
>>>xen: include compat/platform.h from hypercall.h
>>>xen: generate hypercall interface related code
>>>xen: use generated prototypes for hypercall handlers
>>>x86/pv-shim: don't modify hypercall table
>>>xen/x86: don't use hypercall table for calling compat hypercalls
>>>xen/x86: call hypercall handlers via generated macro
>>>xen/arm: call hypercall handlers via generated macro
>>>xen/x86: add hypercall performance counters for hvm, correct pv
>>>xen: drop calls_to_multicall performance counter
>>>tools/xenperf: update hypercall names
>>
>> As it's pretty certain now that parts of this which didn't go in yet will
>> need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
>> expecting a v4 instead.
> 
> Yes, I was planning to spin that up soon.
> 
> The main remaining question is whether we want to switch the return type
> of all hypercalls (or at least the ones common to all archs) not
> requiring to return 64-bit values to "int", as Julien requested.

Could you remind me of the (sub)thread this was in, to read through the
justification again? Without recalling any details I guess I'd prefer
to stick to long for non-compat flavors.

Jan

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Juergen Gross


On 08.03.22 09:50, Jan Beulich wrote:

On 08.03.2022 09:39, Juergen Gross wrote:

On 08.03.22 09:34, Jan Beulich wrote:

On 08.12.2021 16:55, Juergen Gross wrote:

In order to avoid indirect function calls on the hypercall path as
much as possible this series is removing the hypercall function tables
and is replacing the hypercall handler calls via the function array
by automatically generated call macros.

Another by-product of generating the call macros is the automatic
generating of the hypercall handler prototypes from the same data base
which is used to generate the macros.

This has the additional advantage of using type safe calls of the
handlers and to ensure related handler (e.g. PV and HVM ones) share
the same prototypes.

A very brief performance test (parallel build of the Xen hypervisor
in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
the performance with the patches applied. The test was performed using
a PV and a PVH guest.

Changes in V2:
- new patches 6, 14, 15
- patch 7: support hypercall priorities for faster code
- comments addressed

Changes in V3:
- patches 1 and 4 removed as already applied
- comments addressed

Juergen Gross (13):
xen: move do_vcpu_op() to arch specific code
xen: harmonize return types of hypercall handlers
xen: don't include asm/hypercall.h from C sources
xen: include compat/platform.h from hypercall.h
xen: generate hypercall interface related code
xen: use generated prototypes for hypercall handlers
x86/pv-shim: don't modify hypercall table
xen/x86: don't use hypercall table for calling compat hypercalls
xen/x86: call hypercall handlers via generated macro
xen/arm: call hypercall handlers via generated macro
xen/x86: add hypercall performance counters for hvm, correct pv
xen: drop calls_to_multicall performance counter
tools/xenperf: update hypercall names


As it's pretty certain now that parts of this which didn't go in yet will
need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
expecting a v4 instead.


Yes, I was planning to spin that up soon.

The main remaining question is whether we want to switch the return type
of all hypercalls (or at least the ones common to all archs) not
requiring to return 64-bit values to "int", as Julien requested.


Could you remind me of the (sub)thread this was in, to read through the
justification again? Without recalling any details I guess I'd prefer
to stick to long for non-compat flavors.


This discussion started with:

https://lists.xen.org/archives/html/xen-devel/2021-12/threads.html#01293


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v4] vpci/msix: fix PBA accesses

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote:
> On 07.03.2022 17:37, Roger Pau Monne wrote:
> > Map the PBA in order to access it from the MSI-X read and write
> > handlers. Note that previously the handlers would pass the physical
> > host address into the {read,write}{l,q} handlers, which is wrong as
> > those expect a linear address.
> > 
> > Map the PBA using ioremap when the first access is performed. Note
> > that 32bit arches might want to abstract the call to ioremap into a
> > vPCI arch handler, so they can use a fixmap range to map the PBA.
> > 
> > Reported-by: Jan Beulich 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> > Cc: Alex Olson 
> 
> I'll wait a little with committing, in the hope for Alex to re-provide
> a Tested-by.
> 
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct 
> > vpci_msix *msix,
> >  return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
> >  }
> >  
> > +static void __iomem *get_pba(struct vpci *vpci)
> > +{
> > +struct vpci_msix *msix = vpci->msix;
> > +/*
> > + * PBA will only be unmapped when the device is deassigned, so access 
> > it
> > + * without holding the vpci lock.
> > + */
> > +void __iomem *pba = read_atomic(&msix->pba);
> > +
> > +if ( likely(pba) )
> > +return pba;
> > +
> > +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
> > +  vmsix_table_size(vpci, VPCI_MSIX_PBA));
> > +if ( !pba )
> > +return read_atomic(&msix->pba);
> > +
> > +spin_lock(&vpci->lock);
> > +if ( !msix->pba )
> > +{
> > +write_atomic(&msix->pba, pba);
> > +spin_unlock(&vpci->lock);
> > +}
> > +else
> > +{
> > +spin_unlock(&vpci->lock);
> > +iounmap(pba);
> > +}
> 
> TBH I had been hoping for just a single spin_unlock(), but you're
> the maintainer of this code ...

Would you prefer something like:

spin_lock(&vpci->lock);
if ( !msix->pba )
write_atomic(&msix->pba, pba);
spin_unlock(&vpci->lock);

if ( read_atomic(&msix->pba) != pba )
iounmap(pba);

?

Thanks, Roger.

[ovmf test] 168472: regressions - FAIL

2022-03-08 Thread osstest service owner

flight 168472 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168472/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z7 days
Failing since168258  2022-03-01 01:55:31 Z7 days   76 attempts
Testing same since   168469  2022-03-07 23:11:34 Z0 days2 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hua Ma 
  Jason 
  Jason Lou 
  Li, Zhihao 
  Ma, Hua 
  Matt DeVillier 
  Patrick Rudolph 
  Sean Rhodes 
  Sebastien Boeuf 
  Xiaolu.Jiang 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 468 lines long.)

[xen-unstable test] 168470: tolerable FAIL - PUSHED

2022-03-08 Thread osstest service owner

flight 168470 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168470/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168460
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168460
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168460
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168460
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168460
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168460
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168460
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168460
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168460
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168460
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168460
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168460
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  9d4a44380d273de22d5753883cbf5581795ff24d
baseline version:
 xen  4cb8d7a06752b368

Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Julien Grall


Hi Roger,

On 08/03/2022 08:13, Roger Pau Monné wrote:

On Mon, Mar 07, 2022 at 05:19:53PM +, Julien Grall wrote:

Hi Roger,

On 07/03/2022 15:55, Roger Pau Monne wrote:

If livepatching support is enabled build the hypervisor with
-f{function,data}-sections compiler options, which is required by the
livepatching tools to detect changes and create livepatches.

This shouldn't result in any functional change on the hypervisor
binary image, but does however require some changes in the linker
script in order to handle that each function and data item will now be
placed into its own section in object files. As a result add catch-all
for .text, .data and .bss in order to merge each individual item
section into the final image.

The main difference will be that .text.startup will end up being part
of .text rather than .init, and thus won't be freed. .text.exit will
also be part of .text rather than dropped. Overall this could make the
image bigger, and package some .text code in a sub-optimal way.

Note that placement of the sections inside of .text is also slightly
adjusted to be more similar to the position found in the default GNU
ld linker script. This requires having a separate section for the
header in order to place it at the begging of the output image,
followed with the unlikely and related sections, and finally the main
.text section.

On Arm the .data.read_mostly needs to be moved ahead of the .data
section like it's already done on x86, and the alignment needs to be
set to PAGE_SIZE so it doesn't end up being placed at the tail of a
read-only page from the previous section. While there move the
alignment of the .data section ahead of the section declaration, like
it's done for other sections.


This sounds like a bug not related to this patch. Can this be split
separately?


No, .data.read_mostly needs to be moved before .data so that the catch
all .data.* introduced to .data to account for -fdata-sections doesn't
end up also including .data.read_mostly.


Sorry I misread it. I thought you suggested the alignment was also wrong 
before this patch. Thanks for the clarification!






The benefit of having CONFIG_LIVEPATCH enable those compiler option
is that the livepatch build tools no longer need to fiddle with the
build system in order to enable them. Note the current livepatch tools
are broken after the recent build changes due to the way they
attempt to set  -f{function,data}-sections.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
   - Introduce CC_SPLIT_SECTIONS for selecting the compiler options.
   - Drop check for compiler options, all supported versions have them.
   - Re-arrange section placement in .text, to match the default linker
 script.
   - Introduce .text.header to contain the headers bits that must appear
 first in the final binary.
---
Given that now the header is explicitly placed in .text.header, it's
likely possible to simplify some of the ordering of the object files
for the prelink.o generation, as arch/x86/boot/built_in.o no longer
needs to be the first object file in the list.

It also seems on Arm the schedulers and hypfs .data sections should be
moved into read_mostly.
---
Tested by gitlab in order to assert I didn't introduce any regression
on Arm specially.
---
   xen/Makefile  |  2 ++
   xen/arch/arm/arm32/head.S |  1 +
   xen/arch/arm/arm64/head.S |  1 +
   xen/arch/arm/xen.lds.S| 49 +--
   xen/arch/x86/boot/head.S  |  2 +-
   xen/arch/x86/xen.lds.S| 22 +++---
   xen/common/Kconfig|  4 
   7 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/xen/Makefile b/xen/Makefile
index 5c21492d6f..18a4f7e101 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -273,6 +273,8 @@ else
   CFLAGS += -fomit-frame-pointer
   endif
+CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections
+
   CFLAGS += -nostdinc -fno-builtin -fno-common
   CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
   $(call cc-option-add,CFLAGS,CC,-Wvla)
diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 7a906167ef..c837d3054c 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -120,6 +120,7 @@
   #endif /* !CONFIG_EARLY_PRINTK */
+.section .text.header, "ax", %progbits
   .arm
   /*
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 66d862fc81..e62c48ec1c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -133,6 +133,7 @@
   add \xb, \xb, x20
   .endm
+.section .text.header, "ax", %progbits
   /*.aarch64*/
   /*
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 08016948ab..836da880c3 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -30,9 +30,16 @@ SECTIONS
 _start = .;
 .text : {
   _stext = .;/* Text section */
+   *(.text.header)


With this change, the changes in arch/*

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Roger Pau Monné

On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
> As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final
> binaries"), arbitrary sections appearing without our linker script
> placing them explicitly can be a problem. Have the linker make us aware
> of such sections, so we would know that the script needs adjusting.
> 
> To deal with the resulting warnings:
> - Retain .note.* explicitly for ELF, and discard all of them (except the
>   earlier consumed .note.gnu.build-id) for PE/COFF.
> - Have explicit statements for .got, .plt, and alike and add assertions
>   that they're empty. No output sections will be created for these as
>   long as they remain empty (or else the assertions would cause early
>   failure anyway).
> - Collect all .rela.* into a single section, with again an assertion
>   added for the resulting section to be empty.
> - Extend the enumerating of .debug_* to ELF. Note that for Clang adding
>   of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart,
>   .debug_macro, then as well (albeit more may need adding for full
>   coverage).
> 
> Suggested-by: Roger Pau Monné 
> Signed-off-by: Jan Beulich 

LGTM, just two questions.

> ---
> v2: Don't use (NOLOAD) for ELF; it has undocumented (and unhelpful)
> behavior with GNU ld there. Also place .{sym,str,shstr}tab for LLVM
> ld.
> ---
> I would have wanted to make this generic (by putting it in
> xen/Makefile), but the option cannot be added to LDFLAGS, or else
> there'll be a flood of warnings with $(LD) -r. (Besides, adding to
> LDFLAGS would mean use of the option on every linker pass rather than
> just the last one.)
> 
> Retaining of .note in xen-syms is under question. Plus if we want to
> retain all notes, the question is whether they wouldn't better go into
> .init.rodata. But .note.gnu.build-id shouldn't move there, and when
> notes are discontiguous all intermediate space will also be assigned to
> the NOTE segment, thus making the contents useless for tools going just
> by program headers.
> 
> Newer Clang may require yet more .debug_* to be added. I've only played
> with versions 5 and 7 so far.
> 
> Unless we would finally drop all mentioning of Stabs sections, we may
> want to extend to there what is done for Dwarf here (allowing the EFI
> conditional around the section to also go away).
> 
> See also https://sourceware.org/pipermail/binutils/2022-March/119922.html.
> 
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -120,6 +120,8 @@ syms-warn-dup-y := --warn-dup
>  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
>  syms-warn-dup-$(CONFIG_ENFORCE_UNIQUE_SYMBOLS) := --error-dup
>  
> +orphan-handling-$(call ld-option,--orphan-handling=warn) += 
> --orphan-handling=warn
> +
>  $(TARGET): TMP = $(@D)/.$(@F).elf32
>  $(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32
>   $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) 
> $(XEN_IMG_OFFSET) \
> @@ -146,7 +148,7 @@ $(TARGET)-syms: $(BASEDIR)/prelink.o $(o
>   >$(@D)/.$(@F).1.S
>   $(MAKE) $(build)=$(@D) $(@D)/.$(@F).1.o
>   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \
> - $(@D)/.$(@F).1.o -o $@
> + $(orphan-handling-y) $(@D)/.$(@F).1.o -o $@
>   $(NM) -pa --format=sysv $(@D)/$(@F) \
>   | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv 
> --sort \
>   >$(@D)/$(@F).map
> @@ -220,7 +222,7 @@ endif
>   | $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort 
> >$(@D)/.$(@F).1s.S
>   $(MAKE) $(build)=$(@D) .$(@F).1r.o .$(@F).1s.o
>   $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T $(obj)/efi.lds -N $< \
> - $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(note_file_option) 
> -o $@
> +   $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(orphan-handling-y) 
> $(note_file_option) -o $@
>   $(NM) -pa --format=sysv $(@D)/$(@F) \
>   | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv 
> --sort >$(@D)/$(@F).map
>   rm -f $(@D)/.$(@F).[0-9]* $(@D)/..$(@F).[0-9]*
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -12,6 +12,13 @@
>  #undef __XEN_VIRT_START
>  #define __XEN_VIRT_START __image_base__
>  #define DECL_SECTION(x) x :
> +/*
> + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld
> + * for PE output, in order to record that we'd prefer these sections to not
> + * be loaded into memory.
> + */
> +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) }
> +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) }
>  
>  ENTRY(efi_start)
>  
> @@ -19,6 +26,8 @@ ENTRY(efi_start)
>  
>  #define FORMAT "elf64-x86-64"
>  #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START)
> +#define DECL_DEBUG(x, a) #x 0 : { *(x) }
> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }

Would it be helpful to place those in a 

>  
>  ENTRY(start_pa)
>  
> @@ -179,6 +188,13 @@ SECTIONS
>  #endif
>  #endif
>  
> +#ifndef EFI
> +

[libvirt test] 168473: regressions - FAIL

2022-03-08 Thread osstest service owner

flight 168473 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168473/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  34238d7408df4d9c83e7b00accfc310545165787
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  606 days
Failing since151818  2020-07-11 04:18:52 Z  605 days  587 attempts
Testing same since   168473  2022-03-08 04:18:53 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kim InSoo 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja

[PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Bjoern Doebel

Xen enabled CET for supporting architectures. The control flow aspect of
CET expects functions that can be called indirectly (i.e., via function
pointers) to start with an ENDBR64 instruction. Otherwise a control flow
exception is raised.

This expectation breaks livepatching flows because we patch functions by
overwriting their first 5 bytes with a JMP + , thus breaking the
ENDBR64. We fix this by checking the start of a patched function for
being ENDBR64. In the positive case we move the livepatch JMP to start
behind the ENDBR64 instruction.

To avoid having to guess the ENDBR64 offset again on patch reversal
(which might race with other mechanisms adding/removing ENDBR
dynamically), use the livepatch metadata to store the computed offset
along with the saved bytes of the overwritten function.

Signed-off-by: Bjoern Doebel 
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 

Note that on top of livepatching functions, Xen supports an additional
mode where we can "remove" a function by overwriting it with NOPs. This
is only supported for functions up to 31 bytes in size and this patch
reduces this limit to 30 bytes.

Changes since r1:
* use sizeof_field() to avoid unused variable warning
* make metadata variable const in arch_livepatch_revert
---
 xen/arch/x86/livepatch.c | 61 ++--
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 65530c1e57..0fd97f2a00 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -14,11 +14,29 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * CET hotpatching support: We may have functions starting with an ENDBR64
+ * instruction that MUST remain the first instruction of the function, hence
+ * we need to move any hotpatch trampoline further into the function. For that
+ * we need to keep track of the patching offset used for any loaded hotpatch
+ * (to avoid racing against other fixups adding/removing ENDBR64 or similar
+ * instructions).
+ *
+ * We do so by making use of the existing opaque metadata area. We use its
+ * first 4 bytes to track the offset into the function used for patching and
+ * the remainder of the data to store overwritten code bytes.
+ */
+struct x86_livepatch_meta {
+uint8_t patch_offset;
+uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
+};
+
 static bool has_active_waitqueue(const struct vm_event_domain *ved)
 {
 /* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */
@@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)
 
 int arch_livepatch_verify_func(const struct livepatch_func *func)
 {
+BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE);
+
 /* If NOPing.. */
 if ( !func->new_addr )
 {
 /* Only do up to maximum amount we can put in the ->opaque. */
-if ( func->new_size > sizeof(func->opaque) )
+if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
+   instruction) )
 return -EOPNOTSUPP;
 
 if ( func->old_size < func->new_size )
 return -EINVAL;
 }
-else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
-return -EINVAL;
+else
+{
+/*
+ * Space needed now depends on whether the target function
+ * starts with an ENDBR64 instruction.
+ */
+uint8_t needed;
+
+needed = ARCH_PATCH_INSN_SIZE;
+if ( is_endbr64(func->old_addr) )
+needed += ENDBR64_LEN;
+
+if ( func->old_size < needed )
+return -EINVAL;
+}
 
 return 0;
 }
@@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct 
livepatch_func *func)
 void noinline arch_livepatch_apply(struct livepatch_func *func)
 {
 uint8_t *old_ptr;
-uint8_t insn[sizeof(func->opaque)];
+struct x86_livepatch_meta *lp;
+uint8_t insn[sizeof(lp->instruction)];
 unsigned int len;
 
+lp = (struct x86_livepatch_meta *)func->opaque;
+lp->patch_offset = 0;
 old_ptr = func->old_addr;
 len = livepatch_insn_len(func);
 if ( !len )
 return;
 
-memcpy(func->opaque, old_ptr, len);
+if ( is_endbr64(old_ptr) )
+lp->patch_offset += ENDBR64_LEN;
+
+memcpy(lp->instruction, old_ptr + lp->patch_offset, len);
 if ( func->new_addr )
 {
 int32_t val;
@@ -143,14 +183,15 @@ void noinline arch_livepatch_apply(struct livepatch_func 
*func)
 BUILD_BUG_ON(ARCH_PATCH_INSN_SIZE != (1 + sizeof(val)));
 
 insn[0] = 0xe9; /* Relative jump. */
-val = func->new_addr - func->old_addr - ARCH_PATCH_INSN_SIZE;
+val = func->new_addr - (func->old_addr + lp->patch_offset
++ ARCH_PATCH_INSN_SIZE);
 
 memcpy(&insn[1], &val, sizeof(val));
 }
 else
 add_nops(insn, len);
 
-memcpy(old_ptr, insn, len);
+memcpy(old_ptr + lp->patch_offs

[ovmf test] 168475: regressions - FAIL

2022-03-08 Thread osstest service owner

flight 168475 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168475/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z7 days
Failing since168258  2022-03-01 01:55:31 Z7 days   77 attempts
Testing same since   168469  2022-03-07 23:11:34 Z0 days3 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hua Ma 
  Jason 
  Jason Lou 
  Li, Zhihao 
  Ma, Hua 
  Matt DeVillier 
  Patrick Rudolph 
  Sean Rhodes 
  Sebastien Boeuf 
  Xiaolu.Jiang 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 468 lines long.)

Re: [PATCH v5 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-03-08 Thread Roger Pau Monné

On Mon, Mar 07, 2022 at 03:06:08PM +, Jane Malalane wrote:
> Add XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_xapic and
> XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_x2apic to report accelerated xapic
> and x2apic, on x86 hardware.

I think the commit message has gone out of sync with the code, those
should now be XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2,}APIC

> No such features are currently implemented on AMD hardware.
> 
> HW assisted xAPIC virtualization will be reported if HW, at the
> minimum, supports virtualize_apic_accesses as this feature alone means
> that an access to the APIC page will cause an APIC-access VM exit. An
> APIC-access VM exit provides a VMM with information about the access
> causing the VM exit, unlike a regular EPT fault, thus simplifying some
> internal handling.
> 
> HW assisted x2APIC virtualization will be reported if HW supports
> virtualize_x2apic_mode and, at least, either apic_reg_virt or
> virtual_intr_delivery. This is due to apic_reg_virt and
> virtual_intr_delivery preventing a VM exit from occuring or at least
> replacing a regular EPT fault VM-exit with an APIC-access VM-exit on

For x2APIC there's no EPT fault, because x2APIC accesses are in the
MSR space. I think you can omit this whole sentence and just mention
that this now follows the conditionals in vmx_vlapic_msr_changed.

> read and write APIC accesses, respectively. This also means that
> sysctl follows the conditionals in vmx_vlapic_msr_changed().
> 
> For that purpose, also add an arch-specific "capabilities" parameter
> to struct xen_sysctl_physinfo.
> 
> Note that this interface is intended to be compatible with AMD so that
> AVIC support can be introduced in a future patch. Unlike Intel that
> has multiple controls for APIC Virtualization, AMD has one global
> 'AVIC Enable' control bit, so fine-graining of APIC virtualization
> control cannot be done on a common interface.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jane Malalane 
> ---
> CC: Wei Liu 
> CC: Anthony PERARD 
> CC: Juergen Gross 
> CC: Andrew Cooper 
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Julien Grall 
> CC: Stefano Stabellini 
> CC: Volodymyr Babchuk 
> CC: Bertrand Marquis 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: "Roger Pau Monné" 
> 
> v5:
> * Have assisted_xapic_available solely depend on
>   cpu_has_vmx_virtualize_apic_accesses and assisted_x2apic_available
>   depend on cpu_has_vmx_virtualize_x2apic_mode and
>   cpu_has_vmx_apic_reg_virt OR cpu_has_vmx_virtual_intr_delivery.
> 
> v4:
>  * Fallback to the original v2/v1 conditions for setting
>assisted_xapic_available and assisted_x2apic_available so that in
>the future APIC virtualization can be exposed on AMD hardware
>since fine-graining of "AVIC" is not supported, i.e., AMD solely
>uses "AVIC Enable". This also means that sysctl mimics what's
>exposed in CPUID.
> 
> v3:
>  * Define XEN_SYSCTL_PHYSCAP_ARCH_MAX for ABI checking and actually
>set "arch_capbilities", via a call to c_bitmap_to_ocaml_list()
>  * Have assisted_x2apic_available only depend on
>cpu_has_vmx_virtualize_x2apic_mode
> 
> v2:
>  * Use one macro LIBXL_HAVE_PHYSINFO_ASSISTED_APIC instead of two
>  * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
>  * Set assisted_x{2}apic_available to be conditional upon "bsp" and
>annotate it with __ro_after_init
>  * Change XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_X{2}APIC to
>_X86_ASSISTED_X{2}APIC
>  * Keep XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2}APIC contained within
>sysctl.h
>  * Fix padding introduced in struct xen_sysctl_physinfo and bump
>XEN_SYSCTL_INTERFACE_VERSION
> ---
>  tools/golang/xenlight/helpers.gen.go |  4 
>  tools/golang/xenlight/types.gen.go   |  2 ++
>  tools/include/libxl.h|  7 +++
>  tools/libs/light/libxl.c |  3 +++
>  tools/libs/light/libxl_arch.h|  4 
>  tools/libs/light/libxl_arm.c |  5 +
>  tools/libs/light/libxl_types.idl |  2 ++
>  tools/libs/light/libxl_x86.c | 11 +++
>  tools/ocaml/libs/xc/xenctrl.ml   |  5 +
>  tools/ocaml/libs/xc/xenctrl.mli  |  5 +
>  tools/ocaml/libs/xc/xenctrl_stubs.c  | 14 +++---
>  tools/xl/xl_info.c   |  6 --
>  xen/arch/x86/hvm/vmx/vmcs.c  |  9 +
>  xen/arch/x86/include/asm/domain.h|  3 +++
>  xen/arch/x86/sysctl.c|  7 +++
>  xen/include/public/sysctl.h  | 11 ++-
>  16 files changed, 92 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/golang/xenlight/helpers.gen.go 
> b/tools/golang/xenlight/helpers.gen.go
> index b746ff1081..dd4e6c9f14 100644
> --- a/tools/golang/xenlight/helpers.gen.go
> +++ b/tools/golang/xenlight/helpers.gen.go
> @@ -3373,6 +3373,8 @@ x.CapVmtrace = bool(xc.cap_vmtrace)
>  x.CapVpmu = bool(xc.cap_vpmu)
>  x.CapGnttabV1 = bool(xc.cap_gnttab_v1)
>  x.CapGnttabV2 = bool(xc.cap_gnttab_v2)
> +x.CapAssistedXapic = bool(xc.cap_assisted_xapic)
> +x.CapAssistedX2Apic = bool(xc.cap_assis

Re: [PATCH v4] vpci/msix: fix PBA accesses

2022-03-08 Thread Jan Beulich

On 08.03.2022 10:05, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote:
>> On 07.03.2022 17:37, Roger Pau Monne wrote:
>>> Map the PBA in order to access it from the MSI-X read and write
>>> handlers. Note that previously the handlers would pass the physical
>>> host address into the {read,write}{l,q} handlers, which is wrong as
>>> those expect a linear address.
>>>
>>> Map the PBA using ioremap when the first access is performed. Note
>>> that 32bit arches might want to abstract the call to ioremap into a
>>> vPCI arch handler, so they can use a fixmap range to map the PBA.
>>>
>>> Reported-by: Jan Beulich 
>>> Signed-off-by: Roger Pau Monné 
>>
>> Reviewed-by: Jan Beulich 
>>
>>> Cc: Alex Olson 
>>
>> I'll wait a little with committing, in the hope for Alex to re-provide
>> a Tested-by.
>>
>>> --- a/xen/drivers/vpci/msix.c
>>> +++ b/xen/drivers/vpci/msix.c
>>> @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct 
>>> vpci_msix *msix,
>>>  return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
>>>  }
>>>  
>>> +static void __iomem *get_pba(struct vpci *vpci)
>>> +{
>>> +struct vpci_msix *msix = vpci->msix;
>>> +/*
>>> + * PBA will only be unmapped when the device is deassigned, so access 
>>> it
>>> + * without holding the vpci lock.
>>> + */
>>> +void __iomem *pba = read_atomic(&msix->pba);
>>> +
>>> +if ( likely(pba) )
>>> +return pba;
>>> +
>>> +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
>>> +  vmsix_table_size(vpci, VPCI_MSIX_PBA));
>>> +if ( !pba )
>>> +return read_atomic(&msix->pba);
>>> +
>>> +spin_lock(&vpci->lock);
>>> +if ( !msix->pba )
>>> +{
>>> +write_atomic(&msix->pba, pba);
>>> +spin_unlock(&vpci->lock);
>>> +}
>>> +else
>>> +{
>>> +spin_unlock(&vpci->lock);
>>> +iounmap(pba);
>>> +}
>>
>> TBH I had been hoping for just a single spin_unlock(), but you're
>> the maintainer of this code ...
> 
> Would you prefer something like:
> 
> spin_lock(&vpci->lock);
> if ( !msix->pba )
> write_atomic(&msix->pba, pba);
> spin_unlock(&vpci->lock);
> 
> if ( read_atomic(&msix->pba) != pba )
> iounmap(pba);

This or (to avoid the re-read)

spin_lock(&vpci->lock);
if ( !msix->pba )
{
write_atomic(&msix->pba, pba);
pba = NULL;
}
spin_unlock(&vpci->lock);

if ( pba )
iounmap(pba);

Iirc we have similar constructs elsewhere in a few places.

Jan

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Jan Beulich

On 08.03.2022 11:12, Roger Pau Monné wrote:
> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
>> As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final
>> binaries"), arbitrary sections appearing without our linker script
>> placing them explicitly can be a problem. Have the linker make us aware
>> of such sections, so we would know that the script needs adjusting.
>>
>> To deal with the resulting warnings:
>> - Retain .note.* explicitly for ELF, and discard all of them (except the
>>   earlier consumed .note.gnu.build-id) for PE/COFF.
>> - Have explicit statements for .got, .plt, and alike and add assertions
>>   that they're empty. No output sections will be created for these as
>>   long as they remain empty (or else the assertions would cause early
>>   failure anyway).
>> - Collect all .rela.* into a single section, with again an assertion
>>   added for the resulting section to be empty.
>> - Extend the enumerating of .debug_* to ELF. Note that for Clang adding
>>   of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart,
>>   .debug_macro, then as well (albeit more may need adding for full
>>   coverage).
>>
>> Suggested-by: Roger Pau Monné 
>> Signed-off-by: Jan Beulich 
> 
> LGTM, just two questions.

Sure, just that ...

>> @@ -19,6 +26,8 @@ ENTRY(efi_start)
>>  
>>  #define FORMAT "elf64-x86-64"
>>  #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START)
>> +#define DECL_DEBUG(x, a) #x 0 : { *(x) }
>> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }
> 
> Would it be helpful to place those in a 

... you may have had a 3rd one?

>> @@ -179,6 +188,13 @@ SECTIONS
>>  #endif
>>  #endif
>>  
>> +#ifndef EFI
>> +  /* Retain these just for the purpose of possible analysis tools. */
>> +  DECL_SECTION(.note) {
>> +   *(.note.*)
>> +  } PHDR(note) PHDR(text)
> 
> Wouldn't it be enough to place it in the note program header?
> 
> The buildid note is already placed in .rodata, so any remaining notes
> don't need to be in a LOAD section?

All the notes will be covered by the NOTE phdr. I had this much later
in the script originally, but then the NOTE phdr covered large parts of
.init.*. Clearly that yields invalid notes, which analysis (or simple
dumping) tools wouldn't be happy about. We might be able to add 2nd
NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than
one, so changes there would likely be needed then (which I'd like to
avoid for the moment). I'm also not sure in how far tools can be
expected to look for multiple NOTE phdrs ...

>> +#endif
>> +
>>_erodata = .;
>>  
>>. = ALIGN(SECTION_ALIGN);
>> @@ -266,6 +282,32 @@ SECTIONS
>> __ctors_end = .;
>>} PHDR(text)
>>  
>> +#ifndef EFI
>> +  /*
>> +   * With --orphan-sections=warn (or =error) we need to handle certain 
>> linker
>> +   * generated sections. These are all expected to be empty; respective
>> +   * ASSERT()s can be found towards the end of this file.
>> +   */
>> +  DECL_SECTION(.got) {
>> +   *(.got)
>> +  } PHDR(text)
>> +  DECL_SECTION(.got.plt) {
>> +   *(.got.plt)
>> +  } PHDR(text)
>> +  DECL_SECTION(.igot.plt) {
>> +   *(.igot.plt)
>> +  } PHDR(text)
>> +  DECL_SECTION(.iplt) {
>> +   *(.iplt)
>> +  } PHDR(text)
>> +  DECL_SECTION(.plt) {
>> +   *(.plt)
>> +  } PHDR(text)
>> +  DECL_SECTION(.rela) {
>> +   *(.rela.*)
>> +  } PHDR(text)
> 
> Why do you need to explicitly place those in the text program header?

I guess that's largely for consistency with all other directives. With the
assertions that these need to be empty, we might get away without, as long
as no linker would decide to set up another zero-size phdr for them.

Jan

Re: [XEN v9 3/4] xen/arm64: io: Handle the abort due to access to stage1 translation table

2022-03-08 Thread Ayan Kumar Halder


Hi Julien,

On 07/03/2022 23:59, Julien Grall wrote:

Hi,

On 07/03/2022 22:23, Ayan Kumar Halder wrote:


On 07/03/2022 19:37, Julien Grall wrote:



On 07/03/2022 14:27, Ayan Kumar Halder wrote:

Hi Julien,


Hi Ayan,


Hi Julien,

I need a bit of clarification to understand this.





One clarification.

On 04/03/2022 10:39, Julien Grall wrote:

Hi Ayan,

On 01/03/2022 12:40, Ayan Kumar Halder wrote:
If the abort was caused due to access to stage1 translation 
table, Xen
will assume that the stage1 translation table is in the non MMIO 
region.


Reading this commit message again. I think you want to explain why 
we want to do that because, from my understanding, this is 
technically not forbidden by the Arm Arm.


From the previous discussion, we want to do this because we can't 
easily handle such fault on emulated region (we have no away to the 
walker the value read).


Sorry, Can you explain this a bit more ? Do you mean that if the page 
table is located in the emulated region, map_domain_page() (called 
from p2m_next_level()) will fail.


For data abort with valid syndrome, you will have a register to write 
back the value read. When the data abort has s1ptw == 1, AFAICT, we 
have no information how to return the value.


Do you mean that for s1ptw, we get an intermediate physical address ?

    if ( hpfar_is_valid(xabt.s1ptw, fsc) )
    gpa = get_faulting_ipa(gva);

If the IPA corresponds to an emulated region, then Xen can read the 
emulated address, but can't return the value to the guest OS.


(I actually want to understand this very well).





But for emulated region, shouldn't pages be already mapped for Xen to 
access them ?


I am not sure which "pages" you are referring to here. The 
implementation of emulated regions is left to the discretion of Xen. 
This may be reading/writing to a buffer allocated by Xen or return a 
fixed value.






It will try to resolve the translation fault. If it succeeds, it 
will

return to the guest to retry the instruction. If not, then it means
that the table is in MMIO region which is not expected by Xen. Thus,
Xen will forward the abort to the guest.

Signed-off-by: Ayan Kumar Halder 
---

Changelog :-

v1..v8 - NA

v9 - 1. Extracted this change from "[XEN v8 2/2] xen/arm64: io: 
Support
instructions (for which ISS is not..." into a separate patch of 
its own.

The reason being this is an existing bug in the codebase.

  xen/arch/arm/io.c    | 11 +++
  xen/arch/arm/traps.c | 12 +++-
  2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index bea69ffb08..ebcb8ed548 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -128,6 +128,17 @@ void try_decode_instruction(const struct 
cpu_user_regs *regs,

  return;
  }
  +    /*
+ * At this point, we know that the stage1 translation table 
is in the MMIO
+ * region. This is not expected by Xen and thus it forwards 
the abort to the


We don't know that. We only know that there are no corresponding 
valid mapping in the P2M. So the address may be part of an 
emulated MMIO region or invalid.


For both cases, we will want to send an abort.

Furthermore, I would say "emulated MMIO region" rather than MMIO 
region because the P2M can also contain MMIO mapping (we usually 
call then "direct MMIO").


Can I say MMIO region (to indicate both emulated and direct) ? The 
reason being the assumption that stage1 page tables cannot be in 
the MMIO region. And thus, when check_p2m() is invoked, we do not 
invoke try_map_mmio(gaddr_to_gfn(gpa).


See this snippet :-

 if ( xabt.s1ptw )
 check_mmio_region = false;


Thinking a bit more of this. I would actually drop this check. We 
don't need to decode the instruction, so I don't think there are 
much benefits here to restrict access for direct MMIO. Did I miss 
anything?


Can Linux or any OS keep its page tables in the direct MMIO space ? 
If yes, then try_map_mmio() needs to be invoked to map the region, so 
that OS can access it. If not, then Xen needs to return abort because 
the OS may be behaving maliciously.


I think what matters is whether the Arm Arm would or would not allow 
it. From what I can tell there are no such restriction. So we would 
need to be cautious to restrict it further than necessary.




My understanding from previous discussion was that it does not make 
sense for linux or any OS to keep its page tables in any MMIO region 
(emulated or direct). Please correct me if mistaken.


At the moment, none of the regions emulated by Xen could be used for 
page-tables. I am also not sure how we should handle such access if 
they arise. So it is more convenient to simply forbid them.


Regarding direct MMIO, see above. Correct me if I am wrong, but it 
should not be a problem for Xen to deal with them. So while I agree 
this doesn't seem to make sense, the restriction seems unnecessary.


So the behavior will be :-

1. If the stage1 trans

[ovmf test] 168477: regressions - FAIL

2022-03-08 Thread osstest service owner

flight 168477 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168477/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z8 days
Failing since168258  2022-03-01 01:55:31 Z7 days   78 attempts
Testing same since   168469  2022-03-07 23:11:34 Z0 days4 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hua Ma 
  Jason 
  Jason Lou 
  Li, Zhihao 
  Ma, Hua 
  Matt DeVillier 
  Patrick Rudolph 
  Sean Rhodes 
  Sebastien Boeuf 
  Xiaolu.Jiang 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 468 lines long.)

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Roger Pau Monné

On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
> Introduce a new per-domain creation x86 specific flag to
> select whether hardware assisted virtualization should be used for
> x{2}APIC.
> 
> A per-domain option is added to xl in order to select the usage of
> x{2}APIC hardware assisted virtualization, as well as a global
> configuration option.
> 
> Having all APIC interaction exit to Xen for emulation is slow and can
> induce much overhead. Hardware can speed up x{2}APIC by decoding the
> APIC access and providing a VM exit with a more specific exit reason
> than a regular EPT fault or by altogether avoiding a VM exit.
> 
> On the other hand, being able to disable x{2}APIC hardware assisted
> virtualization can be useful for testing and debugging purposes.
> 
> Note: vmx_install_vlapic_mapping doesn't require modifications
> regardless of whether the guest has "Virtualize APIC accesses" enabled
> or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long
> as virtualize_apic_accesses is supported by the CPU.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jane Malalane 
> ---
> CC: Wei Liu 
> CC: Anthony PERARD 
> CC: Juergen Gross 
> CC: Andrew Cooper 
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Julien Grall 
> CC: Stefano Stabellini 
> CC: Christian Lindig 
> CC: David Scott 
> CC: Volodymyr Babchuk 
> CC: "Roger Pau Monné" 
> 
> v5:
> * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of
>   the has_assisted_x{2}apic macros
> * Following changes in assisted_x{2}apic_available definitions in
>   patch 1, retighten conditionals for setting
>   XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in
>   cpuid_hypervisor_leaves()
> 
> v4:
>  * Add has_assisted_x{2}apic macros and use them where appropriate
>  * Replace CPU checks with per-domain assisted_x{2}apic control
>options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(),
>following edits to assisted_x{2}apic_available definitions in
>patch 1
>Note: new assisted_x{2}apic_available definitions make later
>cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery
>checks redundant in vmx_vlapic_msr_changed()
> 
> v3:
>  * Change info in xl.cfg to better express reality and fix
>capitalization of x{2}apic
>  * Move "physinfo" variable definition to the beggining of
>libxl__domain_build_info_setdefault()
>  * Reposition brackets in if statement to match libxl coding style
>  * Shorten logic in libxl__arch_domain_build_info_setdefault()
>  * Correct dprintk message in arch_sanitise_domain_config()
>  * Make appropriate changes in vmx_vlapic_msr_changed() and
>cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit
>  * Remove unneeded parantheses
> 
> v2:
>  * Add a LIBXL_HAVE_ASSISTED_APIC macro
>  * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
>  * Add a return statement in now "int"
>libxl__arch_domain_build_info_setdefault()
>  * Preserve libxl__arch_domain_build_info_setdefault 's location in
>libxl_create.c
>  * Correct x{2}apic default setting logic in
>libxl__arch_domain_prepare_config()
>  * Correct logic for parsing assisted_x{2}apic host/guest options in
>xl_parse.c and initialize them to -1 in xl.c
>  * Use guest options directly in vmx_vlapic_msr_changed
>  * Fix indentation of bool assisted_x{2}apic in struct hvm_domain
>  * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks
> ---
>  docs/man/xl.cfg.5.pod.in| 19 +++
>  docs/man/xl.conf.5.pod.in   | 12 
>  tools/golang/xenlight/helpers.gen.go| 12 
>  tools/golang/xenlight/types.gen.go  |  2 ++
>  tools/include/libxl.h   |  7 +++
>  tools/libs/light/libxl_arch.h   |  5 +++--
>  tools/libs/light/libxl_arm.c|  7 +--
>  tools/libs/light/libxl_create.c | 22 +-
>  tools/libs/light/libxl_types.idl|  2 ++
>  tools/libs/light/libxl_x86.c| 28 ++--
>  tools/ocaml/libs/xc/xenctrl.ml  |  2 ++
>  tools/ocaml/libs/xc/xenctrl.mli |  2 ++
>  tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
>  tools/xl/xl.c   |  8 
>  tools/xl/xl.h   |  2 ++
>  tools/xl/xl_parse.c | 16 
>  xen/arch/x86/domain.c   | 28 +++-
>  xen/arch/x86/hvm/vmx/vmcs.c |  4 
>  xen/arch/x86/hvm/vmx/vmx.c  | 13 -
>  xen/arch/x86/include/asm/hvm/domain.h   |  6 ++
>  xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  3 +++
>  xen/arch/x86/traps.c|  9 +
>  xen/include/public/arch-x86/xen.h   |  2 ++
>  23 files changed, 183 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index b98d161398..dcca564a23 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote:
> On 08.03.2022 11:12, Roger Pau Monné wrote:
> > On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
> >> As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final
> >> binaries"), arbitrary sections appearing without our linker script
> >> placing them explicitly can be a problem. Have the linker make us aware
> >> of such sections, so we would know that the script needs adjusting.
> >>
> >> To deal with the resulting warnings:
> >> - Retain .note.* explicitly for ELF, and discard all of them (except the
> >>   earlier consumed .note.gnu.build-id) for PE/COFF.
> >> - Have explicit statements for .got, .plt, and alike and add assertions
> >>   that they're empty. No output sections will be created for these as
> >>   long as they remain empty (or else the assertions would cause early
> >>   failure anyway).
> >> - Collect all .rela.* into a single section, with again an assertion
> >>   added for the resulting section to be empty.
> >> - Extend the enumerating of .debug_* to ELF. Note that for Clang adding
> >>   of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart,
> >>   .debug_macro, then as well (albeit more may need adding for full
> >>   coverage).
> >>
> >> Suggested-by: Roger Pau Monné 
> >> Signed-off-by: Jan Beulich 
> > 
> > LGTM, just two questions.
> 
> Sure, just that ...
> 
> >> @@ -19,6 +26,8 @@ ENTRY(efi_start)
> >>  
> >>  #define FORMAT "elf64-x86-64"
> >>  #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START)
> >> +#define DECL_DEBUG(x, a) #x 0 : { *(x) }
> >> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }
> > 
> > Would it be helpful to place those in a 
> 
> ... you may have had a 3rd one?

Oh, no, I just forgot to remove this. I was going to ask whether we
could place those in something akin to a PT_NOLOAD program section,
but that doesn't exist AFAICT (and even if possible would require
adjustments to mkelf32).

> 
> >> @@ -179,6 +188,13 @@ SECTIONS
> >>  #endif
> >>  #endif
> >>  
> >> +#ifndef EFI
> >> +  /* Retain these just for the purpose of possible analysis tools. */
> >> +  DECL_SECTION(.note) {
> >> +   *(.note.*)
> >> +  } PHDR(note) PHDR(text)
> > 
> > Wouldn't it be enough to place it in the note program header?
> > 
> > The buildid note is already placed in .rodata, so any remaining notes
> > don't need to be in a LOAD section?
> 
> All the notes will be covered by the NOTE phdr. I had this much later
> in the script originally, but then the NOTE phdr covered large parts of
> .init.*. Clearly that yields invalid notes, which analysis (or simple
> dumping) tools wouldn't be happy about. We might be able to add 2nd
> NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than
> one, so changes there would likely be needed then (which I'd like to
> avoid for the moment). I'm also not sure in how far tools can be
> expected to look for multiple NOTE phdrs ...

But if we are adding a .note section now we might as well merge it
with .note.gnu.build-id:

  DECL_SECTION(.note) {
   __note_gnu_build_id_start = .;
   *(.note.gnu.build-id)
   __note_gnu_build_id_end = .;
   *(.note.*)
  } PHDR(note) PHDR(text)

And drop the .note.Xen section?

> >> +#endif
> >> +
> >>_erodata = .;
> >>  
> >>. = ALIGN(SECTION_ALIGN);
> >> @@ -266,6 +282,32 @@ SECTIONS
> >> __ctors_end = .;
> >>} PHDR(text)
> >>  
> >> +#ifndef EFI
> >> +  /*
> >> +   * With --orphan-sections=warn (or =error) we need to handle certain 
> >> linker
> >> +   * generated sections. These are all expected to be empty; respective
> >> +   * ASSERT()s can be found towards the end of this file.
> >> +   */
> >> +  DECL_SECTION(.got) {
> >> +   *(.got)
> >> +  } PHDR(text)
> >> +  DECL_SECTION(.got.plt) {
> >> +   *(.got.plt)
> >> +  } PHDR(text)
> >> +  DECL_SECTION(.igot.plt) {
> >> +   *(.igot.plt)
> >> +  } PHDR(text)
> >> +  DECL_SECTION(.iplt) {
> >> +   *(.iplt)
> >> +  } PHDR(text)
> >> +  DECL_SECTION(.plt) {
> >> +   *(.plt)
> >> +  } PHDR(text)
> >> +  DECL_SECTION(.rela) {
> >> +   *(.rela.*)
> >> +  } PHDR(text)
> > 
> > Why do you need to explicitly place those in the text program header?
> 
> I guess that's largely for consistency with all other directives. With the
> assertions that these need to be empty, we might get away without, as long
> as no linker would decide to set up another zero-size phdr for them.

We already set the debug sections to not be part of any program header
and seem to get away with it. I'm not sure how different the sections
handled below would be, linkers might indeed want to place them
regardless?

If so it might be good to add a comment that while those should be
empty (and thus don't end up in any program header) we assign them to
the text one in order to avoid the linker from creating a new program
header for them.

Thanks, Roger.

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jan Beulich

On 08.03.2022 12:38, Roger Pau Monné wrote:
> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
>> @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct 
>> xen_domctl_createdomain *config)
>>  }
>>  }
>>  
>> -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
>> +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED |
>> + XEN_X86_ASSISTED_XAPIC |
>> + XEN_X86_ASSISTED_X2APIC) )
>>  {
>>  dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>>  config->arch.misc_flags);
>>  return -EINVAL;
>>  }
>>  
>> +if ( (assisted_xapic || assisted_x2apic) && !hvm )
>> +{
>> +dprintk(XENLOG_INFO,
>> +"Interrupt Controller Virtualization not supported for 
>> PV\n");
>> +return -EINVAL;
>> +}
>> +
>> +if ( (assisted_xapic && !assisted_xapic_available) ||
>> + (assisted_x2apic && !assisted_x2apic_available) )
>> +{
>> +dprintk(XENLOG_INFO,
>> +"Hardware assisted x%sAPIC requested but not available\n",
>> +assisted_xapic && !assisted_xapic_available ? "" : "2");
>> +return -EINVAL;
> 
> I think for those two you could return -ENODEV if others agree.

If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this
and the earlier if()), then I agree. I'm always in favor of using distinct
error codes when possible and at least halfway sensible.

Jan

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote:
> On 08.03.2022 12:38, Roger Pau Monné wrote:
> > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
> >> @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct 
> >> xen_domctl_createdomain *config)
> >>  }
> >>  }
> >>  
> >> -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> >> +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED |
> >> + XEN_X86_ASSISTED_XAPIC |
> >> + XEN_X86_ASSISTED_X2APIC) )
> >>  {
> >>  dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
> >>  config->arch.misc_flags);
> >>  return -EINVAL;
> >>  }
> >>  
> >> +if ( (assisted_xapic || assisted_x2apic) && !hvm )
> >> +{
> >> +dprintk(XENLOG_INFO,
> >> +"Interrupt Controller Virtualization not supported for 
> >> PV\n");
> >> +return -EINVAL;
> >> +}
> >> +
> >> +if ( (assisted_xapic && !assisted_xapic_available) ||
> >> + (assisted_x2apic && !assisted_x2apic_available) )
> >> +{
> >> +dprintk(XENLOG_INFO,
> >> +"Hardware assisted x%sAPIC requested but not available\n",
> >> +assisted_xapic && !assisted_xapic_available ? "" : "2");
> >> +return -EINVAL;
> > 
> > I think for those two you could return -ENODEV if others agree.
> 
> If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this
> and the earlier if()), then I agree. I'm always in favor of using distinct
> error codes when possible and at least halfway sensible.

I would be fine by using it for the !hvm if also. IMO it makes sense
as PV doesn't have an APIC 'device' at all, so ENODEV would seem
fitting. EINVAL is also fine as the caller shouldn't even attempt that
in the first place.

So let's use it for the last if only.

Thanks, Roger.

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Jan Beulich

On 08.03.2022 13:11, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote:
>> On 08.03.2022 11:12, Roger Pau Monné wrote:
>>> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
 @@ -179,6 +188,13 @@ SECTIONS
  #endif
  #endif
  
 +#ifndef EFI
 +  /* Retain these just for the purpose of possible analysis tools. */
 +  DECL_SECTION(.note) {
 +   *(.note.*)
 +  } PHDR(note) PHDR(text)
>>>
>>> Wouldn't it be enough to place it in the note program header?
>>>
>>> The buildid note is already placed in .rodata, so any remaining notes
>>> don't need to be in a LOAD section?
>>
>> All the notes will be covered by the NOTE phdr. I had this much later
>> in the script originally, but then the NOTE phdr covered large parts of
>> .init.*. Clearly that yields invalid notes, which analysis (or simple
>> dumping) tools wouldn't be happy about. We might be able to add 2nd
>> NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than
>> one, so changes there would likely be needed then (which I'd like to
>> avoid for the moment). I'm also not sure in how far tools can be
>> expected to look for multiple NOTE phdrs ...
> 
> But if we are adding a .note section now we might as well merge it
> with .note.gnu.build-id:
> 
>   DECL_SECTION(.note) {
>__note_gnu_build_id_start = .;
>*(.note.gnu.build-id)
>__note_gnu_build_id_end = .;
>*(.note.*)
>   } PHDR(note) PHDR(text)
> 
> And drop the .note.Xen section?

In an ideal world we likely could, yes. But do we know for sure that
nothing recognizes the Xen notes by section name? .note.gnu.build-id
cannot be folded in any event - see the rule for generating note.o,
to be used by xen.efi linking in certain cases.

 +#endif
 +
_erodata = .;
  
. = ALIGN(SECTION_ALIGN);
 @@ -266,6 +282,32 @@ SECTIONS
 __ctors_end = .;
} PHDR(text)
  
 +#ifndef EFI
 +  /*
 +   * With --orphan-sections=warn (or =error) we need to handle certain 
 linker
 +   * generated sections. These are all expected to be empty; respective
 +   * ASSERT()s can be found towards the end of this file.
 +   */
 +  DECL_SECTION(.got) {
 +   *(.got)
 +  } PHDR(text)
 +  DECL_SECTION(.got.plt) {
 +   *(.got.plt)
 +  } PHDR(text)
 +  DECL_SECTION(.igot.plt) {
 +   *(.igot.plt)
 +  } PHDR(text)
 +  DECL_SECTION(.iplt) {
 +   *(.iplt)
 +  } PHDR(text)
 +  DECL_SECTION(.plt) {
 +   *(.plt)
 +  } PHDR(text)
 +  DECL_SECTION(.rela) {
 +   *(.rela.*)
 +  } PHDR(text)
>>>
>>> Why do you need to explicitly place those in the text program header?
>>
>> I guess that's largely for consistency with all other directives. With the
>> assertions that these need to be empty, we might get away without, as long
>> as no linker would decide to set up another zero-size phdr for them.
> 
> We already set the debug sections to not be part of any program header
> and seem to get away with it. I'm not sure how different the sections
> handled below would be, linkers might indeed want to place them
> regardless?

Simply because I don't know I'd like to be on the safe side. Debug sections
can't really be taken as reference: At least GNU ld heavily special-cases
them anyway.

> If so it might be good to add a comment that while those should be
> empty (and thus don't end up in any program header) we assign them to
> the text one in order to avoid the linker from creating a new program
> header for them.

I'll add a sentence to the comment I'm already adding here.

Jan

Re: [PATCH v4] vpci/msix: fix PBA accesses

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 11:46:20AM +0100, Jan Beulich wrote:
> On 08.03.2022 10:05, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote:
> >> On 07.03.2022 17:37, Roger Pau Monne wrote:
> >>> Map the PBA in order to access it from the MSI-X read and write
> >>> handlers. Note that previously the handlers would pass the physical
> >>> host address into the {read,write}{l,q} handlers, which is wrong as
> >>> those expect a linear address.
> >>>
> >>> Map the PBA using ioremap when the first access is performed. Note
> >>> that 32bit arches might want to abstract the call to ioremap into a
> >>> vPCI arch handler, so they can use a fixmap range to map the PBA.
> >>>
> >>> Reported-by: Jan Beulich 
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Reviewed-by: Jan Beulich 
> >>
> >>> Cc: Alex Olson 
> >>
> >> I'll wait a little with committing, in the hope for Alex to re-provide
> >> a Tested-by.
> >>
> >>> --- a/xen/drivers/vpci/msix.c
> >>> +++ b/xen/drivers/vpci/msix.c
> >>> @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct 
> >>> vpci_msix *msix,
> >>>  return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
> >>>  }
> >>>  
> >>> +static void __iomem *get_pba(struct vpci *vpci)
> >>> +{
> >>> +struct vpci_msix *msix = vpci->msix;
> >>> +/*
> >>> + * PBA will only be unmapped when the device is deassigned, so 
> >>> access it
> >>> + * without holding the vpci lock.
> >>> + */
> >>> +void __iomem *pba = read_atomic(&msix->pba);
> >>> +
> >>> +if ( likely(pba) )
> >>> +return pba;
> >>> +
> >>> +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
> >>> +  vmsix_table_size(vpci, VPCI_MSIX_PBA));
> >>> +if ( !pba )
> >>> +return read_atomic(&msix->pba);
> >>> +
> >>> +spin_lock(&vpci->lock);
> >>> +if ( !msix->pba )
> >>> +{
> >>> +write_atomic(&msix->pba, pba);
> >>> +spin_unlock(&vpci->lock);
> >>> +}
> >>> +else
> >>> +{
> >>> +spin_unlock(&vpci->lock);
> >>> +iounmap(pba);
> >>> +}
> >>
> >> TBH I had been hoping for just a single spin_unlock(), but you're
> >> the maintainer of this code ...
> > 
> > Would you prefer something like:
> > 
> > spin_lock(&vpci->lock);
> > if ( !msix->pba )
> > write_atomic(&msix->pba, pba);
> > spin_unlock(&vpci->lock);
> > 
> > if ( read_atomic(&msix->pba) != pba )
> > iounmap(pba);
> 
> This or (to avoid the re-read)
> 
> spin_lock(&vpci->lock);
> if ( !msix->pba )
> {
> write_atomic(&msix->pba, pba);
> pba = NULL;
> }
> spin_unlock(&vpci->lock);
> 
> if ( pba )
> iounmap(pba);
> 
> Iirc we have similar constructs elsewhere in a few places.

I don't really have a strong opinion. I agree the duplicated
spin_unlock() call is not nice, but I think the logic is easier to
follow by using a single if ... else ... section.

Feel free to adjust at commit, or else I can resend if you prefer.

Thanks, Roger.

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Andrew Cooper

On 08/03/2022 10:29, Bjoern Doebel wrote:
> @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)
>  
>  int arch_livepatch_verify_func(const struct livepatch_func *func)
>  {
> +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE);
> +
>  /* If NOPing.. */
>  if ( !func->new_addr )
>  {
>  /* Only do up to maximum amount we can put in the ->opaque. */
> -if ( func->new_size > sizeof(func->opaque) )
> +if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
> +   instruction) )
>  return -EOPNOTSUPP;
>  
>  if ( func->old_size < func->new_size )
>  return -EINVAL;
>  }
> -else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
> -return -EINVAL;
> +else
> +{
> +/*
> + * Space needed now depends on whether the target function
> + * starts with an ENDBR64 instruction.
> + */
> +uint8_t needed;
> +
> +needed = ARCH_PATCH_INSN_SIZE;
> +if ( is_endbr64(func->old_addr) )
> +needed += ENDBR64_LEN;

This won't work for cf_clobber targets, I don't think.  The ENDBR gets
converted to NOP4 and fails this check, but the altcalls calling
old_func had their displacements adjusted by +4.

The is_endbr64() check will fail, and the 5-byte jmp will be written at
the start of the function, and corrupt the instruction stream for the
altcall()'d callers.

Let me write an incremental patch to help.

~Andrew

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Jan Beulich

On 08.03.2022 09:39, Juergen Gross wrote:
> On 08.03.22 09:34, Jan Beulich wrote:
>> On 08.12.2021 16:55, Juergen Gross wrote:
>>> In order to avoid indirect function calls on the hypercall path as
>>> much as possible this series is removing the hypercall function tables
>>> and is replacing the hypercall handler calls via the function array
>>> by automatically generated call macros.
>>>
>>> Another by-product of generating the call macros is the automatic
>>> generating of the hypercall handler prototypes from the same data base
>>> which is used to generate the macros.
>>>
>>> This has the additional advantage of using type safe calls of the
>>> handlers and to ensure related handler (e.g. PV and HVM ones) share
>>> the same prototypes.
>>>
>>> A very brief performance test (parallel build of the Xen hypervisor
>>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
>>> the performance with the patches applied. The test was performed using
>>> a PV and a PVH guest.
>>>
>>> Changes in V2:
>>> - new patches 6, 14, 15
>>> - patch 7: support hypercall priorities for faster code
>>> - comments addressed
>>>
>>> Changes in V3:
>>> - patches 1 and 4 removed as already applied
>>> - comments addressed
>>>
>>> Juergen Gross (13):
>>>xen: move do_vcpu_op() to arch specific code
>>>xen: harmonize return types of hypercall handlers
>>>xen: don't include asm/hypercall.h from C sources
>>>xen: include compat/platform.h from hypercall.h
>>>xen: generate hypercall interface related code
>>>xen: use generated prototypes for hypercall handlers
>>>x86/pv-shim: don't modify hypercall table
>>>xen/x86: don't use hypercall table for calling compat hypercalls
>>>xen/x86: call hypercall handlers via generated macro
>>>xen/arm: call hypercall handlers via generated macro
>>>xen/x86: add hypercall performance counters for hvm, correct pv
>>>xen: drop calls_to_multicall performance counter
>>>tools/xenperf: update hypercall names
>>
>> As it's pretty certain now that parts of this which didn't go in yet will
>> need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
>> expecting a v4 instead.
> 
> Yes, I was planning to spin that up soon.
> 
> The main remaining question is whether we want to switch the return type
> of all hypercalls (or at least the ones common to all archs) not
> requiring to return 64-bit values to "int", as Julien requested.

After walking through the earlier discussion (Jürgen - thanks for the link)
I'm inclined to say that if Arm wants their return values limited to 32 bits
(with exceptions where needed), so be it. But on x86 I'd rather not see us
change this aspect. Of course I'd much prefer if architectures didn't
diverge in this regard, yet then again Arm has already diverged in avoiding
the compat layer (in this case I view the divergence as helpful, though, as
it avoids unnecessary headache).

Jan

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Juergen Gross


On 08.03.22 13:50, Jan Beulich wrote:

On 08.03.2022 09:39, Juergen Gross wrote:

On 08.03.22 09:34, Jan Beulich wrote:

On 08.12.2021 16:55, Juergen Gross wrote:

In order to avoid indirect function calls on the hypercall path as
much as possible this series is removing the hypercall function tables
and is replacing the hypercall handler calls via the function array
by automatically generated call macros.

Another by-product of generating the call macros is the automatic
generating of the hypercall handler prototypes from the same data base
which is used to generate the macros.

This has the additional advantage of using type safe calls of the
handlers and to ensure related handler (e.g. PV and HVM ones) share
the same prototypes.

A very brief performance test (parallel build of the Xen hypervisor
in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
the performance with the patches applied. The test was performed using
a PV and a PVH guest.

Changes in V2:
- new patches 6, 14, 15
- patch 7: support hypercall priorities for faster code
- comments addressed

Changes in V3:
- patches 1 and 4 removed as already applied
- comments addressed

Juergen Gross (13):
xen: move do_vcpu_op() to arch specific code
xen: harmonize return types of hypercall handlers
xen: don't include asm/hypercall.h from C sources
xen: include compat/platform.h from hypercall.h
xen: generate hypercall interface related code
xen: use generated prototypes for hypercall handlers
x86/pv-shim: don't modify hypercall table
xen/x86: don't use hypercall table for calling compat hypercalls
xen/x86: call hypercall handlers via generated macro
xen/arm: call hypercall handlers via generated macro
xen/x86: add hypercall performance counters for hvm, correct pv
xen: drop calls_to_multicall performance counter
tools/xenperf: update hypercall names


As it's pretty certain now that parts of this which didn't go in yet will
need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
expecting a v4 instead.


Yes, I was planning to spin that up soon.

The main remaining question is whether we want to switch the return type
of all hypercalls (or at least the ones common to all archs) not
requiring to return 64-bit values to "int", as Julien requested.


After walking through the earlier discussion (Jürgen - thanks for the link)
I'm inclined to say that if Arm wants their return values limited to 32 bits
(with exceptions where needed), so be it. But on x86 I'd rather not see us
change this aspect. Of course I'd much prefer if architectures didn't
diverge in this regard, yet then again Arm has already diverged in avoiding
the compat layer (in this case I view the divergence as helpful, though, as
it avoids unnecessary headache).


How to handle this in common code then? Have a hypercall_ret_t type
(exact naming TBD) which is defined as long on x86 and int on Arm?
Or use long in the handlers and check the value on Arm side to be a
valid 32-bit signed int (this would be cumbersome for the exceptions,
though)?


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[ovmf test] 168478: regressions - FAIL

2022-03-08 Thread osstest service owner

flight 168478 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168478/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z8 days
Failing since168258  2022-03-01 01:55:31 Z7 days   79 attempts
Testing same since   168469  2022-03-07 23:11:34 Z0 days5 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hua Ma 
  Jason 
  Jason Lou 
  Li, Zhihao 
  Ma, Hua 
  Matt DeVillier 
  Patrick Rudolph 
  Sean Rhodes 
  Sebastien Boeuf 
  Xiaolu.Jiang 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 468 lines long.)

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Konrad Rzeszutek Wilk

On Tue, Mar 08, 2022 at 12:44:54PM +, Andrew Cooper wrote:
> On 08/03/2022 10:29, Bjoern Doebel wrote:
> > @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)
> >  
> >  int arch_livepatch_verify_func(const struct livepatch_func *func)
> >  {
> > +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != 
> > LIVEPATCH_OPAQUE_SIZE);
> > +
> >  /* If NOPing.. */
> >  if ( !func->new_addr )
> >  {
> >  /* Only do up to maximum amount we can put in the ->opaque. */
> > -if ( func->new_size > sizeof(func->opaque) )
> > +if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
> > +   instruction) )
> >  return -EOPNOTSUPP;
> >  
> >  if ( func->old_size < func->new_size )
> >  return -EINVAL;
> >  }
> > -else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
> > -return -EINVAL;
> > +else
> > +{
> > +/*
> > + * Space needed now depends on whether the target function
> > + * starts with an ENDBR64 instruction.
> > + */
> > +uint8_t needed;
> > +
> > +needed = ARCH_PATCH_INSN_SIZE;
> > +if ( is_endbr64(func->old_addr) )
> > +needed += ENDBR64_LEN;
> 
> This won't work for cf_clobber targets, I don't think.  The ENDBR gets
> converted to NOP4 and fails this check, but the altcalls calling
> old_func had their displacements adjusted by +4.
> 
> The is_endbr64() check will fail, and the 5-byte jmp will be written at
> the start of the function, and corrupt the instruction stream for the
> altcall()'d callers.
> 
> Let me write an incremental patch to help.

Please add Acked-by: Konrad Rzeszutek Wilk 
on the patches.

Thank you
> 
> ~Andrew

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Doebel, Bjoern




On 08.03.22 13:44, Andrew Cooper wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 08/03/2022 10:29, Bjoern Doebel wrote:

@@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)

  int arch_livepatch_verify_func(const struct livepatch_func *func)
  {
+BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE);
+
  /* If NOPing.. */
  if ( !func->new_addr )
  {
  /* Only do up to maximum amount we can put in the ->opaque. */
-if ( func->new_size > sizeof(func->opaque) )
+if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
+   instruction) )
  return -EOPNOTSUPP;

  if ( func->old_size < func->new_size )
  return -EINVAL;
  }
-else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
-return -EINVAL;
+else
+{
+/*
+ * Space needed now depends on whether the target function
+ * starts with an ENDBR64 instruction.
+ */
+uint8_t needed;
+
+needed = ARCH_PATCH_INSN_SIZE;
+if ( is_endbr64(func->old_addr) )
+needed += ENDBR64_LEN;


This won't work for cf_clobber targets, I don't think.  The ENDBR gets
converted to NOP4 and fails this check, but the altcalls calling
old_func had their displacements adjusted by +4.

The is_endbr64() check will fail, and the 5-byte jmp will be written at
the start of the function, and corrupt the instruction stream for the
altcall()'d callers.

Let me write an incremental patch to help.


Thanks. Will you be adding a

   memcmp(func->old_addr, ideal_nops[4], 4)

or is that once more too naive?

Bjoern


~Andrew




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Doebel, Bjoern




On 08.03.22 14:06, Konrad Rzeszutek Wilk wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On Tue, Mar 08, 2022 at 12:44:54PM +, Andrew Cooper wrote:

On 08/03/2022 10:29, Bjoern Doebel wrote:

@@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)

  int arch_livepatch_verify_func(const struct livepatch_func *func)
  {
+BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE);
+
  /* If NOPing.. */
  if ( !func->new_addr )
  {
  /* Only do up to maximum amount we can put in the ->opaque. */
-if ( func->new_size > sizeof(func->opaque) )
+if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
+   instruction) )
  return -EOPNOTSUPP;

  if ( func->old_size < func->new_size )
  return -EINVAL;
  }
-else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
-return -EINVAL;
+else
+{
+/*
+ * Space needed now depends on whether the target function
+ * starts with an ENDBR64 instruction.
+ */
+uint8_t needed;
+
+needed = ARCH_PATCH_INSN_SIZE;
+if ( is_endbr64(func->old_addr) )
+needed += ENDBR64_LEN;


This won't work for cf_clobber targets, I don't think.  The ENDBR gets
converted to NOP4 and fails this check, but the altcalls calling
old_func had their displacements adjusted by +4.

The is_endbr64() check will fail, and the 5-byte jmp will be written at
the start of the function, and corrupt the instruction stream for the
altcall()'d callers.

Let me write an incremental patch to help.


Please add Acked-by: Konrad Rzeszutek Wilk 
on the patches.


Thanks, will do!

Bjoern


Thank you


~Andrew




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH 1/4] livepatch: update readme to mention --xen-depends

2022-03-08 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Wednesday, March 2, 2022 2:27 PM
> To: xen-devel@lists.xenproject.org 
> Cc: Ross Lagerwall ; konrad.w...@oracle.com 
> ; doe...@amazon.de ; jul...@xen.org 
> ; Andrew Cooper ; Roger Pau Monne 
> 
> Subject: [PATCH 1/4] livepatch: update readme to mention --xen-depends 
>  
> Fixes: b19df7b2c05e ('livepatch-build: Embed hypervisor build id into every 
> hotpatch')
> Signed-off-by: Roger Pau Monné 
> ---
>  README.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/README.md b/README.md
> index 653c624..b48a3df 100644
> --- a/README.md
> +++ b/README.md
> @@ -24,7 +24,7 @@ Next, build a live patch, using a patch and the source, 
> build ID, and
>  ```
>  $ cd ~/src/livepatch-build
>  $ ./livepatch-build -s ~/src/xenbuild -p ~/src/xsa.patch -o out \
> --c ~/src/xen/xen/.config --depends $BUILDID
> +-c ~/src/xen/xen/.config --depends $BUILDID --xen-depends $BUILDID
>  Building LivePatch patch: xsa
>  
>  Xen directory: /home/ross/src/xenbuild
> -- 
> 2.34.1

Reviewed-by: Ross Lagerwall

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Jan Beulich

On 08.03.2022 13:56, Juergen Gross wrote:
> On 08.03.22 13:50, Jan Beulich wrote:
>> On 08.03.2022 09:39, Juergen Gross wrote:
>>> On 08.03.22 09:34, Jan Beulich wrote:
 On 08.12.2021 16:55, Juergen Gross wrote:
> In order to avoid indirect function calls on the hypercall path as
> much as possible this series is removing the hypercall function tables
> and is replacing the hypercall handler calls via the function array
> by automatically generated call macros.
>
> Another by-product of generating the call macros is the automatic
> generating of the hypercall handler prototypes from the same data base
> which is used to generate the macros.
>
> This has the additional advantage of using type safe calls of the
> handlers and to ensure related handler (e.g. PV and HVM ones) share
> the same prototypes.
>
> A very brief performance test (parallel build of the Xen hypervisor
> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
> the performance with the patches applied. The test was performed using
> a PV and a PVH guest.
>
> Changes in V2:
> - new patches 6, 14, 15
> - patch 7: support hypercall priorities for faster code
> - comments addressed
>
> Changes in V3:
> - patches 1 and 4 removed as already applied
> - comments addressed
>
> Juergen Gross (13):
> xen: move do_vcpu_op() to arch specific code
> xen: harmonize return types of hypercall handlers
> xen: don't include asm/hypercall.h from C sources
> xen: include compat/platform.h from hypercall.h
> xen: generate hypercall interface related code
> xen: use generated prototypes for hypercall handlers
> x86/pv-shim: don't modify hypercall table
> xen/x86: don't use hypercall table for calling compat hypercalls
> xen/x86: call hypercall handlers via generated macro
> xen/arm: call hypercall handlers via generated macro
> xen/x86: add hypercall performance counters for hvm, correct pv
> xen: drop calls_to_multicall performance counter
> tools/xenperf: update hypercall names

 As it's pretty certain now that parts of this which didn't go in yet will
 need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
 expecting a v4 instead.
>>>
>>> Yes, I was planning to spin that up soon.
>>>
>>> The main remaining question is whether we want to switch the return type
>>> of all hypercalls (or at least the ones common to all archs) not
>>> requiring to return 64-bit values to "int", as Julien requested.
>>
>> After walking through the earlier discussion (Jürgen - thanks for the link)
>> I'm inclined to say that if Arm wants their return values limited to 32 bits
>> (with exceptions where needed), so be it. But on x86 I'd rather not see us
>> change this aspect. Of course I'd much prefer if architectures didn't
>> diverge in this regard, yet then again Arm has already diverged in avoiding
>> the compat layer (in this case I view the divergence as helpful, though, as
>> it avoids unnecessary headache).
> 
> How to handle this in common code then? Have a hypercall_ret_t type
> (exact naming TBD) which is defined as long on x86 and int on Arm?
> Or use long in the handlers and check the value on Arm side to be a
> valid 32-bit signed int (this would be cumbersome for the exceptions,
> though)?

I was thinking along the lines of hypercall_ret_t, yes, but the
compiler wouldn't be helping with spotting truncation issues (we can't
reasonably enable the respective warnings, as they would trigger all
over the place). If we were to go that route, we'd rely on an initial
audit and subsequent patch review to spot issues. Therefore,
cumbersome or not, the checking approach may be the more viable one.

Then again Julien may have a better plan in mind; I'd anyway expect
him to supply details on how he thinks such a transition could be done
safely, as he was the one to request limiting to 32 bits.

Jan

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Juergen Gross


On 08.03.22 14:42, Jan Beulich wrote:

On 08.03.2022 13:56, Juergen Gross wrote:

On 08.03.22 13:50, Jan Beulich wrote:

On 08.03.2022 09:39, Juergen Gross wrote:

On 08.03.22 09:34, Jan Beulich wrote:

On 08.12.2021 16:55, Juergen Gross wrote:

In order to avoid indirect function calls on the hypercall path as
much as possible this series is removing the hypercall function tables
and is replacing the hypercall handler calls via the function array
by automatically generated call macros.

Another by-product of generating the call macros is the automatic
generating of the hypercall handler prototypes from the same data base
which is used to generate the macros.

This has the additional advantage of using type safe calls of the
handlers and to ensure related handler (e.g. PV and HVM ones) share
the same prototypes.

A very brief performance test (parallel build of the Xen hypervisor
in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
the performance with the patches applied. The test was performed using
a PV and a PVH guest.

Changes in V2:
- new patches 6, 14, 15
- patch 7: support hypercall priorities for faster code
- comments addressed

Changes in V3:
- patches 1 and 4 removed as already applied
- comments addressed

Juergen Gross (13):
 xen: move do_vcpu_op() to arch specific code
 xen: harmonize return types of hypercall handlers
 xen: don't include asm/hypercall.h from C sources
 xen: include compat/platform.h from hypercall.h
 xen: generate hypercall interface related code
 xen: use generated prototypes for hypercall handlers
 x86/pv-shim: don't modify hypercall table
 xen/x86: don't use hypercall table for calling compat hypercalls
 xen/x86: call hypercall handlers via generated macro
 xen/arm: call hypercall handlers via generated macro
 xen/x86: add hypercall performance counters for hvm, correct pv
 xen: drop calls_to_multicall performance counter
 tools/xenperf: update hypercall names


As it's pretty certain now that parts of this which didn't go in yet will
need re-basing, I'm going to drop this from my waiting-to-be-acked folder,
expecting a v4 instead.


Yes, I was planning to spin that up soon.

The main remaining question is whether we want to switch the return type
of all hypercalls (or at least the ones common to all archs) not
requiring to return 64-bit values to "int", as Julien requested.


After walking through the earlier discussion (Jürgen - thanks for the link)
I'm inclined to say that if Arm wants their return values limited to 32 bits
(with exceptions where needed), so be it. But on x86 I'd rather not see us
change this aspect. Of course I'd much prefer if architectures didn't
diverge in this regard, yet then again Arm has already diverged in avoiding
the compat layer (in this case I view the divergence as helpful, though, as
it avoids unnecessary headache).


How to handle this in common code then? Have a hypercall_ret_t type
(exact naming TBD) which is defined as long on x86 and int on Arm?
Or use long in the handlers and check the value on Arm side to be a
valid 32-bit signed int (this would be cumbersome for the exceptions,
though)?


I was thinking along the lines of hypercall_ret_t, yes, but the
compiler wouldn't be helping with spotting truncation issues (we can't
reasonably enable the respective warnings, as they would trigger all
over the place). If we were to go that route, we'd rely on an initial
audit and subsequent patch review to spot issues. Therefore,
cumbersome or not, the checking approach may be the more viable one.

Then again Julien may have a better plan in mind; I'd anyway expect
him to supply details on how he thinks such a transition could be done
safely, as he was the one to request limiting to 32 bits.


In order to have some progress I could just leave the Arm side alone
in my series. It could be added later if a solution has been agreed
on.

What do you think?


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH 2/4] livepatch: improve rune for fetching of Build ID

2022-03-08 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Wednesday, March 2, 2022 2:27 PM
> To: xen-devel@lists.xenproject.org 
> Cc: Ross Lagerwall ; konrad.w...@oracle.com 
> ; doe...@amazon.de ; jul...@xen.org 
> ; Andrew Cooper ; Roger Pau Monne 
> ; Roger Pau Monné 
> Subject: [PATCH 2/4] livepatch: improve rune for fetching of Build ID 
>  
> The current one is broken with my version of readelf and returns
> 'NT_GNU_BUILD_ID'.
> 
> Signed-off-by: Roger Pau Monné 
> ---
>  README.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/README.md b/README.md
> index b48a3df..948a7de 100644
> --- a/README.md
> +++ b/README.md
> @@ -16,7 +16,7 @@ $ cp -r ~/src/xen ~/src/xenbuild
>  $ cd ~/src/xen/xen
>  $ make nconfig # Make sure to set CONFIG_LIVEPATCH=y
>  $ make
> -$ BUILDID=$(readelf -Wn xen-syms | awk '/Build ID:/ {print $3}')
> +$ BUILDID=$(readelf -Wn xen-syms | sed -n -e 's/^.*Build ID: //p')
>  ```
>  
>  Next, build a live patch, using a patch and the source, build ID, and
> -- 
> 2.34.1

Reviewed-by: Ross Lagerwall

[PATCH v3 0/2] livepatch: enable -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monne

Hello,

The content in v3 has been split in two patches, but is still mostly the
same. The main difference is that first patch does a bit of cleanup of
the build logic now that the header object file doesn't need to be the
first one passed to the linker script.

Thanks, Roger.

Roger Pau Monne (2):
  xen/build: put image header into a separate section
  livepatch: set -f{function,data}-sections compiler option

 xen/Kconfig |  4 +++
 xen/Makefile|  2 ++
 xen/arch/arm/arch.mk|  2 --
 xen/arch/arm/arm32/Makefile |  3 +--
 xen/arch/arm/arm32/head.S   |  1 +
 xen/arch/arm/arm64/Makefile |  3 +--
 xen/arch/arm/arm64/head.S   |  1 +
 xen/arch/arm/xen.lds.S  | 49 -
 xen/arch/x86/Makefile   |  5 ++--
 xen/arch/x86/arch.mk|  2 --
 xen/arch/x86/boot/head.S|  2 +-
 xen/arch/x86/xen.lds.S  | 22 ++---
 xen/common/Kconfig  |  1 +
 13 files changed, 54 insertions(+), 43 deletions(-)

-- 
2.34.1

[PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Roger Pau Monne

So it can be explicitly placed ahead of the rest of the .text content
in the linker script (and thus the resulting image). This is a
prerequisite for further work that will add a catch-all to the text
section (.text.*).

Note that placement of the sections inside of .text is also slightly
adjusted to be more similar to the position found in the default GNU
ld linker script.

The special handling of the object file containing the header data as
the first object file passed to the linker command line can also be
removed.

While there also remove the special handling of efi/ on x86. There's
no need for the resulting object file to be passed in any special
order to the linker.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version (split from patch 2).
---
 xen/arch/arm/arch.mk|  2 --
 xen/arch/arm/arm32/Makefile |  3 +--
 xen/arch/arm/arm32/head.S   |  1 +
 xen/arch/arm/arm64/Makefile |  3 +--
 xen/arch/arm/arm64/head.S   |  1 +
 xen/arch/arm/xen.lds.S  |  8 ++--
 xen/arch/x86/Makefile   |  5 ++---
 xen/arch/x86/arch.mk|  2 --
 xen/arch/x86/boot/head.S|  2 +-
 xen/arch/x86/xen.lds.S  | 13 -
 10 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/arch.mk b/xen/arch/arm/arch.mk
index 094b670723..58db76c4e1 100644
--- a/xen/arch/arm/arch.mk
+++ b/xen/arch/arm/arch.mk
@@ -23,5 +23,3 @@ ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
 LDFLAGS += --fix-cortex-a53-843419
 endif
 endif
-
-ALL_OBJS-y := arch/arm/$(TARGET_SUBARCH)/head.o $(ALL_OBJS-y)
diff --git a/xen/arch/arm/arm32/Makefile b/xen/arch/arm/arm32/Makefile
index 3040eabce3..520fb42054 100644
--- a/xen/arch/arm/arm32/Makefile
+++ b/xen/arch/arm/arm32/Makefile
@@ -4,11 +4,10 @@ obj-$(CONFIG_EARLY_PRINTK) += debug.o
 obj-y += domctl.o
 obj-y += domain.o
 obj-y += entry.o
+obj-y += head.o
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += proc-v7.o proc-caxx.o
 obj-y += smpboot.o
 obj-y += traps.o
 obj-y += vfp.o
-
-extra-y += head.o
diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 7a906167ef..c837d3054c 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -120,6 +120,7 @@
 
 #endif /* !CONFIG_EARLY_PRINTK */
 
+.section .text.header, "ax", %progbits
 .arm
 
 /*
diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index baa87655fa..6d507da0d4 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_EARLY_PRINTK) += debug.o
 obj-y += domctl.o
 obj-y += domain.o
 obj-y += entry.o
+obj-y += head.o
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += smc.o
@@ -14,5 +15,3 @@ obj-y += smpboot.o
 obj-y += traps.o
 obj-y += vfp.o
 obj-y += vsysreg.o
-
-extra-y += head.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 66d862fc81..e62c48ec1c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -133,6 +133,7 @@
 add \xb, \xb, x20
 .endm
 
+.section .text.header, "ax", %progbits
 /*.aarch64*/
 
 /*
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 08016948ab..47d09d6cf1 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -30,9 +30,13 @@ SECTIONS
   _start = .;
   .text : {
 _stext = .;/* Text section */
+   *(.text.header)
+
+   *(.text.cold .text.cold.*)
+   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
+
*(.text)
-   *(.text.cold)
-   *(.text.unlikely)
+
*(.fixup)
*(.gnu.warning)
_etext = .; /* End of text section */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 9c40e0b4d7..04065a7310 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -1,5 +1,7 @@
 obj-y += acpi/
+obj-y += boot/
 obj-y += cpu/
+obj-y += efi/
 obj-y += genapic/
 obj-$(CONFIG_GUEST) += guest/
 obj-$(CONFIG_HVM) += hvm/
@@ -77,9 +79,6 @@ obj-$(CONFIG_COMPAT) += x86_64/platform_hypercall.o
 obj-y += sysctl.o
 endif
 
-# Allows "clean" to descend into boot/
-subdir- += boot
-
 extra-y += asm-macros.i
 extra-y += xen.lds
 
diff --git a/xen/arch/x86/arch.mk b/xen/arch/x86/arch.mk
index 8e57476d65..c90e56aeab 100644
--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -117,5 +117,3 @@ endif
 
 # Set up the assembler include path properly for older toolchains.
 CFLAGS += -Wa,-I$(srctree)/include
-
-ALL_OBJS-y := arch/x86/boot/built_in.o arch/x86/efi/built_in.o $(ALL_OBJS-y)
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index dd1bea0d10..92d73345f0 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -9,7 +9,7 @@
 #include 
 #include 
 
-.text
+.section .text.header, "ax", @progbits
 .code32
 
 #define sym_offs(sym) ((sym) - __XEN_VIRT_START)
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 506bc8e404..715452aad9 100644
--- a/xen/arch/x

[PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monne

If livepatching support is enabled build the hypervisor with
-f{function,data}-sections compiler options, which is required by the
livepatching tools to detect changes and create livepatches.

This shouldn't result in any functional change on the hypervisor
binary image, but does however require some changes in the linker
script in order to handle that each function and data item will now be
placed into its own section in object files. As a result add catch-all
for .text, .data and .bss in order to merge each individual item
section into the final image.

The main difference will be that .text.startup will end up being part
of .text rather than .init, and thus won't be freed. .text.exit will
also be part of .text rather than dropped. Overall this could make the
image bigger, and package some .text code in a sub-optimal way.

On Arm the .data.read_mostly needs to be moved ahead of the .data
section like it's already done on x86, so the .data.* catch-all
doesn't also include .data.read_mostly. The alignment of
.data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
up being placed at the tail of a read-only page from the previous
section. While there move the alignment of the .data section ahead of
the section declaration, like it's done for other sections.

The benefit of having CONFIG_LIVEPATCH enable those compiler option
is that the livepatch build tools no longer need to fiddle with the
build system in order to enable them. Note the current livepatch tools
are broken after the recent build changes due to the way they
attempt to set  -f{function,data}-sections.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Split the placing of the header code in a separate section to a
   pre-patch.
 - Move Kconfig option to xen/Kconfig.
 - Expand reasoning why .data.read_mostly needs to be moved on Arm.

Changes since v1:
 - Introduce CC_SPLIT_SECTIONS for selecting the compiler options.
 - Drop check for compiler options, all supported versions have them.
 - Re-arrange section placement in .text, to match the default linker
   script.
 - Introduce .text.header to contain the headers bits that must appear
   first in the final binary.
---
It seems on Arm the schedulers and hypfs .data sections should be
moved into read_mostly.
---
Tested by gitlab in order to assert I didn't introduce any regression
on Arm specially.
---
 xen/Kconfig|  4 
 xen/Makefile   |  2 ++
 xen/arch/arm/xen.lds.S | 41 +
 xen/arch/x86/xen.lds.S |  9 +
 xen/common/Kconfig |  1 +
 5 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/xen/Kconfig b/xen/Kconfig
index bcbd2758e5..d134397a0b 100644
--- a/xen/Kconfig
+++ b/xen/Kconfig
@@ -27,6 +27,10 @@ config CLANG_VERSION
 config CC_HAS_VISIBILITY_ATTRIBUTE
def_bool $(cc-option,-fvisibility=hidden)
 
+# Use -f{function,data}-sections compiler parameters
+config CC_SPLIT_SECTIONS
+   bool
+
 source "arch/$(SRCARCH)/Kconfig"
 
 config DEFCONFIG_LIST
diff --git a/xen/Makefile b/xen/Makefile
index 5c21492d6f..18a4f7e101 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -273,6 +273,8 @@ else
 CFLAGS += -fomit-frame-pointer
 endif
 
+CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections
+
 CFLAGS += -nostdinc -fno-builtin -fno-common
 CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith
 $(call cc-option-add,CFLAGS,CC,-Wvla)
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 47d09d6cf1..836da880c3 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -36,6 +36,9 @@ SECTIONS
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
 
*(.text)
+#ifdef CONFIG_CC_SPLIT_SECTIONS
+   *(.text.*)
+#endif
 
*(.fixup)
*(.gnu.warning)
@@ -82,10 +85,24 @@ SECTIONS
 #endif
   _erodata = .;/* End of read-only data */
 
+  . = ALIGN(PAGE_SIZE);
+  .data.read_mostly : {
+   /* Exception table */
+   __start___ex_table = .;
+   *(.ex_table)
+   __stop___ex_table = .;
+
+   /* Pre-exception table */
+   __start___pre_ex_table = .;
+   *(.ex_table.pre)
+   __stop___pre_ex_table = .;
+
+   *(.data.read_mostly)
+  } :text
+
+  . = ALIGN(SMP_CACHE_BYTES);
   .data : {/* Data */
-   . = ALIGN(PAGE_SIZE);
*(.data.page_aligned)
-   *(.data)
. = ALIGN(8);
__start_schedulers_array = .;
*(.data.schedulers)
@@ -98,26 +115,10 @@ SECTIONS
__paramhypfs_end = .;
 #endif
 
-   *(.data.rel)
-   *(.data.rel.*)
+   *(.data .data.*)
CONSTRUCTORS
   } :text
 
-  . = ALIGN(SMP_CACHE_BYTES);
-  .data.read_mostly : {
-   /* Exception table */
-   __start___ex_table = .;
-   *(.ex_table)
-   __stop___ex_table = .;
-
-   /* Pre-exception table */
-   __start___pre_ex_table = .;
-   *(.ex_table.pre)
-   __stop___pre_ex_table = .;
-
-   *(.data.read_mostly)
-  } :text
-
   . = ALI

Re: [PATCH 3/4] livepatch: do the initial build using CROSS_COMPILE

2022-03-08 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Wednesday, March 2, 2022 2:27 PM
> To: xen-devel@lists.xenproject.org 
> Cc: Ross Lagerwall ; konrad.w...@oracle.com 
> ; doe...@amazon.de ; jul...@xen.org 
> ; Andrew Cooper ; Roger Pau Monne 
> 
> Subject: [PATCH 3/4] livepatch: do the initial build using CROSS_COMPILE 
>  
> Setting it afterwards for further builds will cause the build logic to
> detect a change and thus force a rebuild of all sources.
> 
> Signed-off-by: Roger Pau Monné 
> ---
>  livepatch-build | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/livepatch-build b/livepatch-build
> index e1715ea..38a92be 100755
> --- a/livepatch-build
> +++ b/livepatch-build
> @@ -92,7 +92,6 @@ function build_special()
>  cd "${SRCDIR}" || die
>  
>  # Capture .o files from the patched build
> -export CROSS_COMPILE="${TOOLSDIR}/livepatch-gcc "
>  export LIVEPATCH_BUILD_DIR="$(pwd)/"
>  export LIVEPATCH_CAPTURE_DIR="$OUTPUT/${name}"
>  mkdir -p "$LIVEPATCH_CAPTURE_DIR"
> @@ -408,6 +407,8 @@ if [ "${SKIP}" != "build" ]; then
>  XEN_DEBUG="debug=$XEN_DEBUG"
>  fi
>  
> +export CROSS_COMPILE="${TOOLSDIR}/livepatch-gcc "
> +
>  echo "Perform full initial build with ${CPUS} CPU(s)..."
>  build_full
>  
> -- 
> 2.34.1

Reviewed-by: Ross Lagerwall

Re: [PATCH v3 00/13] xen: drop hypercall function tables

2022-03-08 Thread Jan Beulich

On 08.03.2022 14:44, Juergen Gross wrote:
> On 08.03.22 14:42, Jan Beulich wrote:
>> On 08.03.2022 13:56, Juergen Gross wrote:
>>> On 08.03.22 13:50, Jan Beulich wrote:
 On 08.03.2022 09:39, Juergen Gross wrote:
> On 08.03.22 09:34, Jan Beulich wrote:
>> On 08.12.2021 16:55, Juergen Gross wrote:
>>> In order to avoid indirect function calls on the hypercall path as
>>> much as possible this series is removing the hypercall function tables
>>> and is replacing the hypercall handler calls via the function array
>>> by automatically generated call macros.
>>>
>>> Another by-product of generating the call macros is the automatic
>>> generating of the hypercall handler prototypes from the same data base
>>> which is used to generate the macros.
>>>
>>> This has the additional advantage of using type safe calls of the
>>> handlers and to ensure related handler (e.g. PV and HVM ones) share
>>> the same prototypes.
>>>
>>> A very brief performance test (parallel build of the Xen hypervisor
>>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
>>> the performance with the patches applied. The test was performed using
>>> a PV and a PVH guest.
>>>
>>> Changes in V2:
>>> - new patches 6, 14, 15
>>> - patch 7: support hypercall priorities for faster code
>>> - comments addressed
>>>
>>> Changes in V3:
>>> - patches 1 and 4 removed as already applied
>>> - comments addressed
>>>
>>> Juergen Gross (13):
>>>  xen: move do_vcpu_op() to arch specific code
>>>  xen: harmonize return types of hypercall handlers
>>>  xen: don't include asm/hypercall.h from C sources
>>>  xen: include compat/platform.h from hypercall.h
>>>  xen: generate hypercall interface related code
>>>  xen: use generated prototypes for hypercall handlers
>>>  x86/pv-shim: don't modify hypercall table
>>>  xen/x86: don't use hypercall table for calling compat hypercalls
>>>  xen/x86: call hypercall handlers via generated macro
>>>  xen/arm: call hypercall handlers via generated macro
>>>  xen/x86: add hypercall performance counters for hvm, correct pv
>>>  xen: drop calls_to_multicall performance counter
>>>  tools/xenperf: update hypercall names
>>
>> As it's pretty certain now that parts of this which didn't go in yet will
>> need re-basing, I'm going to drop this from my waiting-to-be-acked 
>> folder,
>> expecting a v4 instead.
>
> Yes, I was planning to spin that up soon.
>
> The main remaining question is whether we want to switch the return type
> of all hypercalls (or at least the ones common to all archs) not
> requiring to return 64-bit values to "int", as Julien requested.

 After walking through the earlier discussion (Jürgen - thanks for the link)
 I'm inclined to say that if Arm wants their return values limited to 32 
 bits
 (with exceptions where needed), so be it. But on x86 I'd rather not see us
 change this aspect. Of course I'd much prefer if architectures didn't
 diverge in this regard, yet then again Arm has already diverged in avoiding
 the compat layer (in this case I view the divergence as helpful, though, as
 it avoids unnecessary headache).
>>>
>>> How to handle this in common code then? Have a hypercall_ret_t type
>>> (exact naming TBD) which is defined as long on x86 and int on Arm?
>>> Or use long in the handlers and check the value on Arm side to be a
>>> valid 32-bit signed int (this would be cumbersome for the exceptions,
>>> though)?
>>
>> I was thinking along the lines of hypercall_ret_t, yes, but the
>> compiler wouldn't be helping with spotting truncation issues (we can't
>> reasonably enable the respective warnings, as they would trigger all
>> over the place). If we were to go that route, we'd rely on an initial
>> audit and subsequent patch review to spot issues. Therefore,
>> cumbersome or not, the checking approach may be the more viable one.
>>
>> Then again Julien may have a better plan in mind; I'd anyway expect
>> him to supply details on how he thinks such a transition could be done
>> safely, as he was the one to request limiting to 32 bits.
> 
> In order to have some progress I could just leave the Arm side alone
> in my series. It could be added later if a solution has been agreed
> on.
> 
> What do you think?

I see no issue with this if there's no other dependency on Arm following
suit.

Jan

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Jan Beulich

On 08.03.2022 14:49, Roger Pau Monne wrote:
> So it can be explicitly placed ahead of the rest of the .text content
> in the linker script (and thus the resulting image). This is a
> prerequisite for further work that will add a catch-all to the text
> section (.text.*).
> 
> Note that placement of the sections inside of .text is also slightly
> adjusted to be more similar to the position found in the default GNU
> ld linker script.
> 
> The special handling of the object file containing the header data as
> the first object file passed to the linker command line can also be
> removed.
> 
> While there also remove the special handling of efi/ on x86. There's
> no need for the resulting object file to be passed in any special
> order to the linker.
> 
> Signed-off-by: Roger Pau Monné 

Looks good to me, but I have one question before feeling ready to
offer R-b:

> @@ -86,8 +84,13 @@ SECTIONS
> *(.text.kexec)  /* Page aligned in the object file. */
> kexec_reloc_end = .;
>  
> -   *(.text.cold)
> -   *(.text.unlikely)
> +   *(.text.cold .text.cold.*)
> +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)

What generates .text.*_unlikely? And if anything really does, why
would .text.cold not have a similar equivalent?

Jan

Re: [PATCH 4/4] livepatch: differentiate between old and new build systems

2022-03-08 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Wednesday, March 2, 2022 2:27 PM
> To: xen-devel@lists.xenproject.org 
> Cc: Ross Lagerwall ; konrad.w...@oracle.com 
> ; doe...@amazon.de ; jul...@xen.org 
> ; Andrew Cooper ; Roger Pau Monne 
> 
> Subject: [PATCH 4/4] livepatch: differentiate between old and new build 
> systems 
>  
> Do not attempt to modify the build system if CFLAGS are not set in
> Rules.mk, and instead rely on CONFIG_LIVEPATCH already setting
> -f{function,data}-sections.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> This depends on getting the patch to add -f{function,data}-sections
> when using CONFIG_LIVEPATCH accepted.
> ---
>  livepatch-build | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/livepatch-build b/livepatch-build
> index 38a92be..656cdac 100755
> --- a/livepatch-build
> +++ b/livepatch-build
> @@ -98,14 +98,20 @@ function build_special()
>  
>  # Build with special GCC flags
>  cd "${SRCDIR}/xen" || die
> -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections 
> -fdata-sections/' Rules.mk
> -cp -p arch/x86/Makefile arch/x86/Makefile.bak
> -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> arch/x86/Makefile
> -# Restore timestamps to prevent spurious rebuilding
> -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || die
> -sed -i 's/CFLAGS += -nostdinc -ffunction-sections -fdata-sections/CFLAGS 
> += -nostdinc/' Rules.mk
> -mv -f arch/x86/Makefile.bak arch/x86/Makefile
> +if grep -q 'nostdinc' Rules.mk; then
> + # Support for old build system, attempt to set 
> -f{function,data}-sections and rebuild
> +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc 
> -ffunction-sections -fdata-sections/' Rules.mk
> +cp -p arch/x86/Makefile arch/x86/Makefile.bak
> +sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> arch/x86/Makefile
> +# Restore timestamps to prevent spurious rebuilding
> +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> || die
> +sed -i 's/CFLAGS += -nostdinc -ffunction-sections 
> -fdata-sections/CFLAGS += -nostdinc/' Rules.mk
> +mv -f arch/x86/Makefile.bak arch/x86/Makefile
> +else
> +# -f{function,data}-sections set by CONFIG_LIVEPATCH
> +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> || die
> +fi
>  
>  unset LIVEPATCH_BUILD_DIR
>  unset LIVEPATCH_CAPTURE_DIR
> -- 
> 2.34.1

Reviewed-by: Ross Lagerwall

[PATCH] x86/cet: Use dedicated NOP4 for cf_clobber

2022-03-08 Thread Andrew Cooper

For livepatching, we need to look at a potentially clobbered function and
determine whether it used to have an ENDBR64 instruction.

Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
the was_endbr64() predicate.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Bjoern Doebel 
CC: Michael Kurth 
CC: Martin Pohlack 

Bjoern: For the livepatching code, I think you want:

  if ( is_endbr64(...) || was_endbr64(...) )
  needed += ENDBR64_LEN;
---
 xen/arch/x86/alternative.c   | 10 +-
 xen/arch/x86/include/asm/endbr.h | 12 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index d41eeef1bcaf..ffb1b1d960c8 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -362,7 +362,15 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
 continue;
 
-add_nops(ptr, ENDBR64_LEN);
+/*
+ * Can't use add_nops() here.  ENDBR64_POISON is specifically
+ * different to NOP4 so it can be spotted after the fact.
+ *
+ * All CET-capable hardware uses P6 NOPS (no need to plumb through
+ * ideal_nops), and doesn't require a branch to synchronise the
+ * instruction stream.
+ */
+memcpy(ptr, ENDBR64_POISON, ENDBR64_LEN);
 clobbered++;
 }
 
diff --git a/xen/arch/x86/include/asm/endbr.h b/xen/arch/x86/include/asm/endbr.h
index 6090afeb0bd8..5e1e55cb467d 100644
--- a/xen/arch/x86/include/asm/endbr.h
+++ b/xen/arch/x86/include/asm/endbr.h
@@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
 *(uint32_t *)ptr = gen_endbr64();
 }
 
+/*
+ * After clobbering ENDBR64, we may need to confirm that the site used to
+ * contain an ENDBR64 instruction.  Use an encoding which isn't the default
+ * P6_NOP4.
+ */
+#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
+
+static inline bool was_endbr64(const void *ptr)
+{
+return *(const uint32_t *)ptr == 0x001f0f66;
+}
+
 #endif /* XEN_ASM_ENDBR_H */
-- 
2.11.0

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 01:34:06PM +0100, Jan Beulich wrote:
> On 08.03.2022 13:11, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote:
> >> On 08.03.2022 11:12, Roger Pau Monné wrote:
> >>> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
>  @@ -179,6 +188,13 @@ SECTIONS
>   #endif
>   #endif
>   
>  +#ifndef EFI
>  +  /* Retain these just for the purpose of possible analysis tools. */
>  +  DECL_SECTION(.note) {
>  +   *(.note.*)
>  +  } PHDR(note) PHDR(text)
> >>>
> >>> Wouldn't it be enough to place it in the note program header?
> >>>
> >>> The buildid note is already placed in .rodata, so any remaining notes
> >>> don't need to be in a LOAD section?
> >>
> >> All the notes will be covered by the NOTE phdr. I had this much later
> >> in the script originally, but then the NOTE phdr covered large parts of
> >> .init.*. Clearly that yields invalid notes, which analysis (or simple
> >> dumping) tools wouldn't be happy about. We might be able to add 2nd
> >> NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than
> >> one, so changes there would likely be needed then (which I'd like to
> >> avoid for the moment). I'm also not sure in how far tools can be
> >> expected to look for multiple NOTE phdrs ...
> > 
> > But if we are adding a .note section now we might as well merge it
> > with .note.gnu.build-id:
> > 
> >   DECL_SECTION(.note) {
> >__note_gnu_build_id_start = .;
> >*(.note.gnu.build-id)
> >__note_gnu_build_id_end = .;
> >*(.note.*)
> >   } PHDR(note) PHDR(text)
> > 
> > And drop the .note.Xen section?
> 
> In an ideal world we likely could, yes. But do we know for sure that
> nothing recognizes the Xen notes by section name?

Wouldn't that be wrong? In the elfnotes.h file it's clearly specified
that Xen notes live in a PT_NOTE program header and have 'Xen' in the
name field. There's no requirement of them being in any specific
section.

> .note.gnu.build-id
> cannot be folded in any event - see the rule for generating note.o,
> to be used by xen.efi linking in certain cases.

Right, so we need to keep the .note.gnu.build-id section, but we could
likely fold .note.Xen into .note I think?

Or at least add a comment to mention that we don't want to fold
.note.Xen into .note in case there are tools that search for specific
Xen notes to be contained in .note.Xen.

>  +#endif
>  +
> _erodata = .;
>   
> . = ALIGN(SECTION_ALIGN);
>  @@ -266,6 +282,32 @@ SECTIONS
>  __ctors_end = .;
> } PHDR(text)
>   
>  +#ifndef EFI
>  +  /*
>  +   * With --orphan-sections=warn (or =error) we need to handle certain 
>  linker
>  +   * generated sections. These are all expected to be empty; respective
>  +   * ASSERT()s can be found towards the end of this file.
>  +   */
>  +  DECL_SECTION(.got) {
>  +   *(.got)
>  +  } PHDR(text)
>  +  DECL_SECTION(.got.plt) {
>  +   *(.got.plt)
>  +  } PHDR(text)
>  +  DECL_SECTION(.igot.plt) {
>  +   *(.igot.plt)
>  +  } PHDR(text)
>  +  DECL_SECTION(.iplt) {
>  +   *(.iplt)
>  +  } PHDR(text)
>  +  DECL_SECTION(.plt) {
>  +   *(.plt)
>  +  } PHDR(text)
>  +  DECL_SECTION(.rela) {
>  +   *(.rela.*)
>  +  } PHDR(text)
> >>>
> >>> Why do you need to explicitly place those in the text program header?
> >>
> >> I guess that's largely for consistency with all other directives. With the
> >> assertions that these need to be empty, we might get away without, as long
> >> as no linker would decide to set up another zero-size phdr for them.
> > 
> > We already set the debug sections to not be part of any program header
> > and seem to get away with it. I'm not sure how different the sections
> > handled below would be, linkers might indeed want to place them
> > regardless?
> 
> Simply because I don't know I'd like to be on the safe side. Debug sections
> can't really be taken as reference: At least GNU ld heavily special-cases
> them anyway.
> 
> > If so it might be good to add a comment that while those should be
> > empty (and thus don't end up in any program header) we assign them to
> > the text one in order to avoid the linker from creating a new program
> > header for them.
> 
> I'll add a sentence to the comment I'm already adding here.

Thanks, Roger.

Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Jan Beulich

On 08.03.2022 14:49, Roger Pau Monne wrote:
> If livepatching support is enabled build the hypervisor with
> -f{function,data}-sections compiler options, which is required by the
> livepatching tools to detect changes and create livepatches.
> 
> This shouldn't result in any functional change on the hypervisor
> binary image, but does however require some changes in the linker
> script in order to handle that each function and data item will now be
> placed into its own section in object files. As a result add catch-all
> for .text, .data and .bss in order to merge each individual item
> section into the final image.
> 
> The main difference will be that .text.startup will end up being part
> of .text rather than .init, and thus won't be freed. .text.exit will
> also be part of .text rather than dropped. Overall this could make the
> image bigger, and package some .text code in a sub-optimal way.
> 
> On Arm the .data.read_mostly needs to be moved ahead of the .data
> section like it's already done on x86, so the .data.* catch-all
> doesn't also include .data.read_mostly. The alignment of
> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
> up being placed at the tail of a read-only page from the previous
> section. While there move the alignment of the .data section ahead of
> the section declaration, like it's done for other sections.
> 
> The benefit of having CONFIG_LIVEPATCH enable those compiler option
> is that the livepatch build tools no longer need to fiddle with the
> build system in order to enable them. Note the current livepatch tools
> are broken after the recent build changes due to the way they
> attempt to set  -f{function,data}-sections.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -88,6 +88,9 @@ SECTIONS
> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
>  
> *(.text)
> +#ifdef CONFIG_CC_SPLIT_SECTIONS
> +   *(.text.*)
> +#endif
> *(.text.__x86_indirect_thunk_*)
> *(.text.page_aligned)

These last two now will not have any effect anymore when
CC_SPLIT_SECTIONS=y. This may have undesirable effects on the
overall size when there is more than one object with a
.text.page_aligned contribution. In .data ...

> @@ -292,9 +295,7 @@ SECTIONS
>  
>DECL_SECTION(.data) {
> *(.data.page_aligned)
> -   *(.data)
> -   *(.data.rel)
> -   *(.data.rel.*)
> +   *(.data .data.*)
>} PHDR(text)

... this continues to be named first. I wonder whether we wouldn't
want to use SORT_BY_ALIGNMENT (if available) instead in both places.

Jan

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Andrew Cooper

On 08/03/2022 13:49, Roger Pau Monne wrote:
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 08016948ab..47d09d6cf1 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -30,9 +30,13 @@ SECTIONS
>_start = .;
>.text : {
>  _stext = .;/* Text section */
> +   *(.text.header)
> +
> +   *(.text.cold .text.cold.*)
> +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> +
> *(.text)
> -   *(.text.cold)
> -   *(.text.unlikely)
> +

Most of this hunk wants to move into patch 2.  Patch 1 should only
introduce .text.header.

Same for x86.  Can fix on commit.

~Andrew

Re: [PATCH v2] x86/build: use --orphan-handling linker option if available

2022-03-08 Thread Jan Beulich

On 08.03.2022 15:07, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 01:34:06PM +0100, Jan Beulich wrote:
>> On 08.03.2022 13:11, Roger Pau Monné wrote:
>>> On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote:
 On 08.03.2022 11:12, Roger Pau Monné wrote:
> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote:
>> @@ -179,6 +188,13 @@ SECTIONS
>>  #endif
>>  #endif
>>  
>> +#ifndef EFI
>> +  /* Retain these just for the purpose of possible analysis tools. */
>> +  DECL_SECTION(.note) {
>> +   *(.note.*)
>> +  } PHDR(note) PHDR(text)
>
> Wouldn't it be enough to place it in the note program header?
>
> The buildid note is already placed in .rodata, so any remaining notes
> don't need to be in a LOAD section?

 All the notes will be covered by the NOTE phdr. I had this much later
 in the script originally, but then the NOTE phdr covered large parts of
 .init.*. Clearly that yields invalid notes, which analysis (or simple
 dumping) tools wouldn't be happy about. We might be able to add 2nd
 NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than
 one, so changes there would likely be needed then (which I'd like to
 avoid for the moment). I'm also not sure in how far tools can be
 expected to look for multiple NOTE phdrs ...
>>>
>>> But if we are adding a .note section now we might as well merge it
>>> with .note.gnu.build-id:
>>>
>>>   DECL_SECTION(.note) {
>>>__note_gnu_build_id_start = .;
>>>*(.note.gnu.build-id)
>>>__note_gnu_build_id_end = .;
>>>*(.note.*)
>>>   } PHDR(note) PHDR(text)
>>>
>>> And drop the .note.Xen section?
>>
>> In an ideal world we likely could, yes. But do we know for sure that
>> nothing recognizes the Xen notes by section name?
> 
> Wouldn't that be wrong? In the elfnotes.h file it's clearly specified
> that Xen notes live in a PT_NOTE program header and have 'Xen' in the
> name field. There's no requirement of them being in any specific
> section.

True. But ELF also tells us to not go from section names (only), but
to consider type and attribute as well. Yet what do most tools do?

>> .note.gnu.build-id
>> cannot be folded in any event - see the rule for generating note.o,
>> to be used by xen.efi linking in certain cases.
> 
> Right, so we need to keep the .note.gnu.build-id section, but we could
> likely fold .note.Xen into .note I think?
> 
> Or at least add a comment to mention that we don't want to fold
> .note.Xen into .note in case there are tools that search for specific
> Xen notes to be contained in .note.Xen.

I can add such a comment, sure.

Jan

Re: [XEN PATCH v9 06/30] build: rework test/livepatch/Makefile

2022-03-08 Thread Ross Lagerwall

> From: Anthony PERARD 
> Sent: Tuesday, January 25, 2022 11:00 AM
> To: xen-devel@lists.xenproject.org 
> Cc: Anthony Perard ; Jan Beulich 
> ; Andrew Cooper ; George Dunlap 
> ; Julien Grall ; Stefano Stabellini 
> ; Wei Liu ; Konrad Rzeszutek Wilk 
> ; Ross Lagerwall 
> Subject: [XEN PATCH v9 06/30] build: rework test/livepatch/Makefile 
>  
> This rework the livepatch/Makefile to make it less repetitive and make
> use of the facilities. All the targets to be built are now listed in
> $(extra-y) which will allow Rules.mk to build them without the need of
> a local target in a future patch.
> 
> There are some changes/fixes in this patch:
> - when "xen-syms" is used for a target, it is added to the dependency
>   list of the target, which allow to rebuild the target when xen-syms
>   changes. But if "xen-syms" is missing, make simply fails.
> - modinfo.o wasn't removing it's $@.bin file like the other targets,
>   this is now done.
> - The command to build *.livepatch targets as been fixed to use
>   $(XEN_LDFLAGS) rather than just $(LDFLAGS) which is a fallout from
>   2740d96efdd3 ("xen/build: have the root Makefile generates the
>   CFLAGS")
> 
> make will findout the dependencies of the *.livepatch files and thus
> what to built by "looking" at the objects listed in the *-objs
> variables. The actual dependencies is generated by the new
> "multi-depend" macro.
> 
> "$(targets)" needs to be updated with the objects listed in the
> different *-objs variables to allow make to load the .*.cmd dependency
> files.
> 
> This patch copies the macro "multi_depend" from Linux 5.12, and rename
> it to "multi-depend".
> 
> Signed-off-by: Anthony PERARD 
> Acked-by: Jan Beulich 
> ---
> 

Acked-by: Ross Lagerwall

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote:
> On 08.03.2022 14:49, Roger Pau Monne wrote:
> > So it can be explicitly placed ahead of the rest of the .text content
> > in the linker script (and thus the resulting image). This is a
> > prerequisite for further work that will add a catch-all to the text
> > section (.text.*).
> > 
> > Note that placement of the sections inside of .text is also slightly
> > adjusted to be more similar to the position found in the default GNU
> > ld linker script.
> > 
> > The special handling of the object file containing the header data as
> > the first object file passed to the linker command line can also be
> > removed.
> > 
> > While there also remove the special handling of efi/ on x86. There's
> > no need for the resulting object file to be passed in any special
> > order to the linker.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Looks good to me, but I have one question before feeling ready to
> offer R-b:
> 
> > @@ -86,8 +84,13 @@ SECTIONS
> > *(.text.kexec)  /* Page aligned in the object file. */
> > kexec_reloc_end = .;
> >  
> > -   *(.text.cold)
> > -   *(.text.unlikely)
> > +   *(.text.cold .text.cold.*)
> > +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> 
> What generates .text.*_unlikely? And if anything really does, why
> would .text.cold not have a similar equivalent?

That matches what I saw in the default linker script from my version
of GNU ld:

*(.text.unlikely .text.*_unlikely .text.unlikely.*)

I really don't know what could generate .text.*_unlikely, but since
it's part of the default linker script I assumed it was better to just
add it.

Thanks, Roger.

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 02:11:28PM +, Andrew Cooper wrote:
> On 08/03/2022 13:49, Roger Pau Monne wrote:
> > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> > index 08016948ab..47d09d6cf1 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/xen/arch/arm/xen.lds.S
> > @@ -30,9 +30,13 @@ SECTIONS
> >_start = .;
> >.text : {
> >  _stext = .;/* Text section */
> > +   *(.text.header)
> > +
> > +   *(.text.cold .text.cold.*)
> > +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> > +
> > *(.text)
> > -   *(.text.cold)
> > -   *(.text.unlikely)
> > +
> 
> Most of this hunk wants to move into patch 2.  Patch 1 should only
> introduce .text.header.
> 
> Same for x86.  Can fix on commit.

The justification for doing it here is to better match the ordering
used by the default linker script from GNU ld. Obviously this also
benefits the following patch. Feel free to move, but I think it's also
fine to do it here.

Thanks, Roger.

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jane Malalane

On 08/03/2022 12:33, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote:
>> On 08.03.2022 12:38, Roger Pau Monné wrote:
>>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
 @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct 
 xen_domctl_createdomain *config)
   }
   }
   
 -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
 +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED |
 + XEN_X86_ASSISTED_XAPIC |
 + XEN_X86_ASSISTED_X2APIC) )
   {
   dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
   config->arch.misc_flags);
   return -EINVAL;
   }
   
 +if ( (assisted_xapic || assisted_x2apic) && !hvm )
 +{
 +dprintk(XENLOG_INFO,
 +"Interrupt Controller Virtualization not supported for 
 PV\n");
 +return -EINVAL;
 +}
 +
 +if ( (assisted_xapic && !assisted_xapic_available) ||
 + (assisted_x2apic && !assisted_x2apic_available) )
 +{
 +dprintk(XENLOG_INFO,
 +"Hardware assisted x%sAPIC requested but not available\n",
 +assisted_xapic && !assisted_xapic_available ? "" : "2");
 +return -EINVAL;
>>>
>>> I think for those two you could return -ENODEV if others agree.
>>
>> If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this
>> and the earlier if()), then I agree. I'm always in favor of using distinct
>> error codes when possible and at least halfway sensible.
> 
> I would be fine by using it for the !hvm if also. IMO it makes sense
> as PV doesn't have an APIC 'device' at all, so ENODEV would seem
> fitting. EINVAL is also fine as the caller shouldn't even attempt that
> in the first place.
> 
> So let's use it for the last if only.
Wouldn't it make more sense to use -ENODEV particularly for the first? I 
agree that -ENODEV should be reported in the first case because it 
doesn't make sense to request acceleration of something that doesn't 
exist and I should have put that. But having a look at the hap code 
(since it resembles the second case), it returns -EINVAL when it is not 
available, unless you deem this to be different or, in retrospective, 
that the hap code should too have been coded to return -ENODEV.

if ( hap && !hvm_hap_supported() )
 {
 dprintk(XENLOG_INFO, "HAP requested but not available\n");
 return -EINVAL;
 }

I agree with all the other comments and have made the approp changes for 
v6. Thanks a lot.

Jane.

Re: [PATCH 1/2] Livepatch: resolve old address before function verification

2022-03-08 Thread Ross Lagerwall

> From: Bjoern Doebel 
> Sent: Monday, March 7, 2022 11:53 AM
> To: xen-devel@lists.xenproject.org 
> Cc: Michael Kurth ; Martin Pohlack ; 
> Roger Pau Monne ; Andrew Cooper 
> ; Bjoern Doebel ; Konrad 
> Rzeszutek Wilk ; Ross Lagerwall 
> 
> Subject: [PATCH 1/2] Livepatch: resolve old address before function 
> verification 
>  
> When verifying that a livepatch can be applied, we may as well want to
> inspect the target function to be patched. To do so, we need to resolve
> this function's address before running the arch-specific
> livepatch_verify hook.
> 
> Signed-off-by: Bjoern Doebel 
> CC: Konrad Rzeszutek Wilk 
> CC: Ross Lagerwall 
> ---
>  xen/common/livepatch.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> index ec301a9f12..be2cf75c2d 100644
> --- a/xen/common/livepatch.c
> +++ b/xen/common/livepatch.c
> @@ -684,11 +684,11 @@ static int prepare_payload(struct payload *payload,
>  return -EINVAL;
>  }
>  
> -rc = arch_livepatch_verify_func(f);
> +rc = resolve_old_address(f, elf);
>  if ( rc )
>  return rc;
>  
> -rc = resolve_old_address(f, elf);
> +rc = arch_livepatch_verify_func(f);
>  if ( rc )
>  return rc;
>  
> -- 
> 2.32.0

Reviewed-by: Ross Lagerwall

Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber

2022-03-08 Thread Jan Beulich

On 08.03.2022 15:01, Andrew Cooper wrote:
> For livepatching, we need to look at a potentially clobbered function and
> determine whether it used to have an ENDBR64 instruction.
> 
> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce
> the was_endbr64() predicate.

Did you consider using ENDBR32 for this purpose? I'm worried that
the pattern you picked is still too close to what might be used
(down the road) by a tool chain. If ENDBR32 isn't suitable for
some reason, how about "nop %sp" or "nopw (%rsp)" (and then maybe
even "data16" substituted by rex, cs, ds, es, or ss)?

One neat thing about ENDBR32 would be that you wouldn't even
need memcpy() - you'd merely flip the low main opcode bit.

> Bjoern: For the livepatching code, I think you want:
> 
>   if ( is_endbr64(...) || was_endbr64(...) )
>   needed += ENDBR64_LEN;

Looks like I didn't fully understand the problem then from your
initial description. The adjustment here (and the one needed in
Björn's patch) is to compensate for the advancing of the
targets of altcalls past the ENDBR?

> --- a/xen/arch/x86/include/asm/endbr.h
> +++ b/xen/arch/x86/include/asm/endbr.h
> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>  *(uint32_t *)ptr = gen_endbr64();
>  }
>  
> +/*
> + * After clobbering ENDBR64, we may need to confirm that the site used to
> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
> + * P6_NOP4.
> + */
> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */

In case this remains as is - did you mean "opsz" instead of "osp"?
But this really is "nopw (%rax)" anyway.

Jan

Re: [PATCH 4/4] livepatch: differentiate between old and new build systems

2022-03-08 Thread Andrew Cooper

On 02/03/2022 14:27, Roger Pau Monne wrote:
> diff --git a/livepatch-build b/livepatch-build
> index 38a92be..656cdac 100755
> --- a/livepatch-build
> +++ b/livepatch-build
> @@ -98,14 +98,20 @@ function build_special()
>  
>  # Build with special GCC flags
>  cd "${SRCDIR}/xen" || die
> -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections 
> -fdata-sections/' Rules.mk
> -cp -p arch/x86/Makefile arch/x86/Makefile.bak
> -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> arch/x86/Makefile
> -# Restore timestamps to prevent spurious rebuilding
> -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || die
> -sed -i 's/CFLAGS += -nostdinc -ffunction-sections -fdata-sections/CFLAGS 
> += -nostdinc/' Rules.mk
> -mv -f arch/x86/Makefile.bak arch/x86/Makefile
> +if grep -q 'nostdinc' Rules.mk; then
> + # Support for old build system, attempt to set 
> -f{function,data}-sections and rebuild
> +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc 
> -ffunction-sections -fdata-sections/' Rules.mk
> +cp -p arch/x86/Makefile arch/x86/Makefile.bak
> +sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> arch/x86/Makefile
> +# Restore timestamps to prevent spurious rebuilding
> +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> || die
> +sed -i 's/CFLAGS += -nostdinc -ffunction-sections 
> -fdata-sections/CFLAGS += -nostdinc/' Rules.mk
> +mv -f arch/x86/Makefile.bak arch/x86/Makefile
> +else
> +# -f{function,data}-sections set by CONFIG_LIVEPATCH
> +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> || die
> +fi

This really ought to be the other way around, by spotting the thing we
know is good, and then falling back to the heuristics.  In light of the
updates to the Xen side, something like:

if grep -q CC_SPLIT_SECTIONS Kconfig; then
    # -f{function,data}-sections set by CONFIG_LIVEPATCH
    make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log"
|| die
else
    # Support for old build system, attempt to set
-f{function,data}-sections and rebuild
    ...
fi

?
~Andrew

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jan Beulich

On 08.03.2022 15:31, Jane Malalane wrote:
> On 08/03/2022 12:33, Roger Pau Monné wrote:
>> On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote:
>>> On 08.03.2022 12:38, Roger Pau Monné wrote:
 On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
> @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct 
> xen_domctl_createdomain *config)
>   }
>   }
>   
> -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED |
> + XEN_X86_ASSISTED_XAPIC |
> + XEN_X86_ASSISTED_X2APIC) )
>   {
>   dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>   config->arch.misc_flags);
>   return -EINVAL;
>   }
>   
> +if ( (assisted_xapic || assisted_x2apic) && !hvm )
> +{
> +dprintk(XENLOG_INFO,
> +"Interrupt Controller Virtualization not supported for 
> PV\n");
> +return -EINVAL;
> +}
> +
> +if ( (assisted_xapic && !assisted_xapic_available) ||
> + (assisted_x2apic && !assisted_x2apic_available) )
> +{
> +dprintk(XENLOG_INFO,
> +"Hardware assisted x%sAPIC requested but not 
> available\n",
> +assisted_xapic && !assisted_xapic_available ? "" : "2");
> +return -EINVAL;

 I think for those two you could return -ENODEV if others agree.
>>>
>>> If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this
>>> and the earlier if()), then I agree. I'm always in favor of using distinct
>>> error codes when possible and at least halfway sensible.
>>
>> I would be fine by using it for the !hvm if also. IMO it makes sense
>> as PV doesn't have an APIC 'device' at all, so ENODEV would seem
>> fitting. EINVAL is also fine as the caller shouldn't even attempt that
>> in the first place.
>>
>> So let's use it for the last if only.
> Wouldn't it make more sense to use -ENODEV particularly for the first? I 
> agree that -ENODEV should be reported in the first case because it 
> doesn't make sense to request acceleration of something that doesn't 
> exist and I should have put that. But having a look at the hap code 
> (since it resembles the second case), it returns -EINVAL when it is not 
> available, unless you deem this to be different or, in retrospective, 
> that the hap code should too have been coded to return -ENODEV.
> 
> if ( hap && !hvm_hap_supported() )
>  {
>  dprintk(XENLOG_INFO, "HAP requested but not available\n");
>  return -EINVAL;
>  }

This is just one of the examples where using -ENODEV as you suggest
would introduce an inconsistency. We use -EINVAL also for other
purely guest-type dependent checks.

Jan

Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote:
> On 08.03.2022 14:49, Roger Pau Monne wrote:
> > If livepatching support is enabled build the hypervisor with
> > -f{function,data}-sections compiler options, which is required by the
> > livepatching tools to detect changes and create livepatches.
> > 
> > This shouldn't result in any functional change on the hypervisor
> > binary image, but does however require some changes in the linker
> > script in order to handle that each function and data item will now be
> > placed into its own section in object files. As a result add catch-all
> > for .text, .data and .bss in order to merge each individual item
> > section into the final image.
> > 
> > The main difference will be that .text.startup will end up being part
> > of .text rather than .init, and thus won't be freed. .text.exit will
> > also be part of .text rather than dropped. Overall this could make the
> > image bigger, and package some .text code in a sub-optimal way.
> > 
> > On Arm the .data.read_mostly needs to be moved ahead of the .data
> > section like it's already done on x86, so the .data.* catch-all
> > doesn't also include .data.read_mostly. The alignment of
> > .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
> > up being placed at the tail of a read-only page from the previous
> > section. While there move the alignment of the .data section ahead of
> > the section declaration, like it's done for other sections.
> > 
> > The benefit of having CONFIG_LIVEPATCH enable those compiler option
> > is that the livepatch build tools no longer need to fiddle with the
> > build system in order to enable them. Note the current livepatch tools
> > are broken after the recent build changes due to the way they
> > attempt to set  -f{function,data}-sections.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> > --- a/xen/arch/x86/xen.lds.S
> > +++ b/xen/arch/x86/xen.lds.S
> > @@ -88,6 +88,9 @@ SECTIONS
> > *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> >  
> > *(.text)
> > +#ifdef CONFIG_CC_SPLIT_SECTIONS
> > +   *(.text.*)
> > +#endif
> > *(.text.__x86_indirect_thunk_*)
> > *(.text.page_aligned)
> 
> These last two now will not have any effect anymore when
> CC_SPLIT_SECTIONS=y. This may have undesirable effects on the
> overall size when there is more than one object with a
> .text.page_aligned contribution. In .data ...

Agreed. I wondered whether to move those ahead of the main text
section, so likely:

   *(.text.unlikely .text.*_unlikely .text.unlikely.*)

   *(.text.page_aligned)
   *(.text.__x86_indirect_thunk_*)
   *(.text)
#ifdef CONFIG_CC_SPLIT_SECTIONS
   *(.text.*)
#endif

FWIW, Linux seems fine to package .text.page_aligned together with the
rest of .text using the .text.[0-9a-zA-Z_]* catch-all.

> > @@ -292,9 +295,7 @@ SECTIONS
> >  
> >DECL_SECTION(.data) {
> > *(.data.page_aligned)
> > -   *(.data)
> > -   *(.data.rel)
> > -   *(.data.rel.*)
> > +   *(.data .data.*)
> >} PHDR(text)
> 
> ... this continues to be named first. I wonder whether we wouldn't
> want to use SORT_BY_ALIGNMENT (if available) instead in both places.

We could use the command line option if available
(--sort-section=alignment) to sort all wildcard sections?

Thanks, Roger.

Re: [PATCH 4/4] livepatch: differentiate between old and new build systems

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 02:38:47PM +, Andrew Cooper wrote:
> On 02/03/2022 14:27, Roger Pau Monne wrote:
> > diff --git a/livepatch-build b/livepatch-build
> > index 38a92be..656cdac 100755
> > --- a/livepatch-build
> > +++ b/livepatch-build
> > @@ -98,14 +98,20 @@ function build_special()
> >  
> >  # Build with special GCC flags
> >  cd "${SRCDIR}/xen" || die
> > -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections 
> > -fdata-sections/' Rules.mk
> > -cp -p arch/x86/Makefile arch/x86/Makefile.bak
> > -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> > arch/x86/Makefile
> > -# Restore timestamps to prevent spurious rebuilding
> > -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> > -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || 
> > die
> > -sed -i 's/CFLAGS += -nostdinc -ffunction-sections 
> > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk
> > -mv -f arch/x86/Makefile.bak arch/x86/Makefile
> > +if grep -q 'nostdinc' Rules.mk; then
> > + # Support for old build system, attempt to set 
> > -f{function,data}-sections and rebuild
> > +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc 
> > -ffunction-sections -fdata-sections/' Rules.mk
> > +cp -p arch/x86/Makefile arch/x86/Makefile.bak
> > +sed -i 
> > 's/--section-alignment=0x20/--section-alignment=0x1000/' 
> > arch/x86/Makefile
> > +# Restore timestamps to prevent spurious rebuilding
> > +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile
> > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> > || die
> > +sed -i 's/CFLAGS += -nostdinc -ffunction-sections 
> > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk
> > +mv -f arch/x86/Makefile.bak arch/x86/Makefile
> > +else
> > +# -f{function,data}-sections set by CONFIG_LIVEPATCH
> > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" 
> > || die
> > +fi
> 
> This really ought to be the other way around, by spotting the thing we
> know is good, and then falling back to the heuristics.  In light of the
> updates to the Xen side, something like:

I'm not sure I agree. I do prefer to spot the 'bad' one, and just
fallback to expecting Xen to correctly set -f{function,data}-sections
otherwise.

> if grep -q CC_SPLIT_SECTIONS Kconfig; then

Because this logic ties us to not moving CC_SPLIT_SECTIONS from being
defined in xen/Kconfig (or even changing it's name), and gain ties the
livepatch tools to internal details about the Xen build system.

Thanks, Roger.

Re: [PATCH v3 0/2] livepatch: enable -f{function,data}-sections compiler option

2022-03-08 Thread Julien Grall


Hi,

On 08/03/2022 13:49, Roger Pau Monne wrote:

Hello,

The content in v3 has been split in two patches, but is still mostly the
same. The main difference is that first patch does a bit of cleanup of
the build logic now that the header object file doesn't need to be the
first one passed to the linker script.

Thanks, Roger.

Roger Pau Monne (2):
   xen/build: put image header into a separate section
   livepatch: set -f{function,data}-sections compiler option


For the Arm bits:

Acked-by: Julien Grall  # xen/arm

Cheers,



  xen/Kconfig |  4 +++
  xen/Makefile|  2 ++
  xen/arch/arm/arch.mk|  2 --
  xen/arch/arm/arm32/Makefile |  3 +--
  xen/arch/arm/arm32/head.S   |  1 +
  xen/arch/arm/arm64/Makefile |  3 +--
  xen/arch/arm/arm64/head.S   |  1 +
  xen/arch/arm/xen.lds.S  | 49 -
  xen/arch/x86/Makefile   |  5 ++--
  xen/arch/x86/arch.mk|  2 --
  xen/arch/x86/boot/head.S|  2 +-
  xen/arch/x86/xen.lds.S  | 22 ++---
  xen/common/Kconfig  |  1 +
  13 files changed, 54 insertions(+), 43 deletions(-)



--
Julien Grall

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Jan Beulich

On 08.03.2022 15:18, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote:
>> On 08.03.2022 14:49, Roger Pau Monne wrote:
>>> So it can be explicitly placed ahead of the rest of the .text content
>>> in the linker script (and thus the resulting image). This is a
>>> prerequisite for further work that will add a catch-all to the text
>>> section (.text.*).
>>>
>>> Note that placement of the sections inside of .text is also slightly
>>> adjusted to be more similar to the position found in the default GNU
>>> ld linker script.
>>>
>>> The special handling of the object file containing the header data as
>>> the first object file passed to the linker command line can also be
>>> removed.
>>>
>>> While there also remove the special handling of efi/ on x86. There's
>>> no need for the resulting object file to be passed in any special
>>> order to the linker.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>
>> Looks good to me, but I have one question before feeling ready to
>> offer R-b:
>>
>>> @@ -86,8 +84,13 @@ SECTIONS
>>> *(.text.kexec)  /* Page aligned in the object file. */
>>> kexec_reloc_end = .;
>>>  
>>> -   *(.text.cold)
>>> -   *(.text.unlikely)
>>> +   *(.text.cold .text.cold.*)
>>> +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
>>
>> What generates .text.*_unlikely? And if anything really does, why
>> would .text.cold not have a similar equivalent?
> 
> That matches what I saw in the default linker script from my version
> of GNU ld:
> 
> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> 
> I really don't know what could generate .text.*_unlikely, but since
> it's part of the default linker script I assumed it was better to just
> add it.

I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from
4.6.x. onwards it would be .text.unlikely.*.

As to the dissimilarity with .text.cold: I wonder why we have that in
the first place. It matches our __cold attribute, just that we don't
use that anywhere afaics.

In any event:
Reviewed-by: Jan Beulich 
albeit preferably with .text.cold.* dropped again.

Jan

Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Jan Beulich

On 08.03.2022 15:46, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote:
>> On 08.03.2022 14:49, Roger Pau Monne wrote:
>>> If livepatching support is enabled build the hypervisor with
>>> -f{function,data}-sections compiler options, which is required by the
>>> livepatching tools to detect changes and create livepatches.
>>>
>>> This shouldn't result in any functional change on the hypervisor
>>> binary image, but does however require some changes in the linker
>>> script in order to handle that each function and data item will now be
>>> placed into its own section in object files. As a result add catch-all
>>> for .text, .data and .bss in order to merge each individual item
>>> section into the final image.
>>>
>>> The main difference will be that .text.startup will end up being part
>>> of .text rather than .init, and thus won't be freed. .text.exit will
>>> also be part of .text rather than dropped. Overall this could make the
>>> image bigger, and package some .text code in a sub-optimal way.
>>>
>>> On Arm the .data.read_mostly needs to be moved ahead of the .data
>>> section like it's already done on x86, so the .data.* catch-all
>>> doesn't also include .data.read_mostly. The alignment of
>>> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
>>> up being placed at the tail of a read-only page from the previous
>>> section. While there move the alignment of the .data section ahead of
>>> the section declaration, like it's done for other sections.
>>>
>>> The benefit of having CONFIG_LIVEPATCH enable those compiler option
>>> is that the livepatch build tools no longer need to fiddle with the
>>> build system in order to enable them. Note the current livepatch tools
>>> are broken after the recent build changes due to the way they
>>> attempt to set  -f{function,data}-sections.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>
>> Reviewed-by: Jan Beulich 
>>
>>> --- a/xen/arch/x86/xen.lds.S
>>> +++ b/xen/arch/x86/xen.lds.S
>>> @@ -88,6 +88,9 @@ SECTIONS
>>> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
>>>  
>>> *(.text)
>>> +#ifdef CONFIG_CC_SPLIT_SECTIONS
>>> +   *(.text.*)
>>> +#endif
>>> *(.text.__x86_indirect_thunk_*)
>>> *(.text.page_aligned)
>>
>> These last two now will not have any effect anymore when
>> CC_SPLIT_SECTIONS=y. This may have undesirable effects on the
>> overall size when there is more than one object with a
>> .text.page_aligned contribution. In .data ...
> 
> Agreed. I wondered whether to move those ahead of the main text
> section, so likely:
> 
>*(.text.unlikely .text.*_unlikely .text.unlikely.*)
> 
>*(.text.page_aligned)
>*(.text.__x86_indirect_thunk_*)
>*(.text)
> #ifdef CONFIG_CC_SPLIT_SECTIONS
>*(.text.*)
> #endif

Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*,
though. When adding .text.* that one can likely go away.

> FWIW, Linux seems fine to package .text.page_aligned together with the
> rest of .text using the .text.[0-9a-zA-Z_]* catch-all.

There's no question this is functionally fine. The question is how
many extra padding areas are inserted because of this.

>>> @@ -292,9 +295,7 @@ SECTIONS
>>>  
>>>DECL_SECTION(.data) {
>>> *(.data.page_aligned)
>>> -   *(.data)
>>> -   *(.data.rel)
>>> -   *(.data.rel.*)
>>> +   *(.data .data.*)
>>>} PHDR(text)
>>
>> ... this continues to be named first. I wonder whether we wouldn't
>> want to use SORT_BY_ALIGNMENT (if available) instead in both places.
> 
> We could use the command line option if available
> (--sort-section=alignment) to sort all wildcard sections?

Depends on the scope of the sorting that would result when enabled
globally like this.

Jan

Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber

2022-03-08 Thread Andrew Cooper

On 08/03/2022 14:37, Jan Beulich wrote:
> On 08.03.2022 15:01, Andrew Cooper wrote:
>> For livepatching, we need to look at a potentially clobbered function and
>> determine whether it used to have an ENDBR64 instruction.
>>
>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and 
>> introduce
>> the was_endbr64() predicate.
> Did you consider using ENDBR32 for this purpose?

No, and no because that's very short sighted.  Even 4 non-nops would be
better than ENDBR32, because they wouldn't create actually-usable
codepaths in corner cases we care to exclude.

> I'm worried that
> the pattern you picked is still too close to what might be used
> (down the road) by a tool chain.

This is what Linux are doing too, but no - I'm not worried.  The
encoding isn't the only protection; toolchains also have no reason to
put a nop4 at the head of functions; nop5 is the common one to find.

> One neat thing about ENDBR32 would be that you wouldn't even
> need memcpy() - you'd merely flip the low main opcode bit.

Not relevant.  You're taking the SMC pipeline hit for any sized write,
and a single movl is far less cryptic.

>> Bjoern: For the livepatching code, I think you want:
>>
>>   if ( is_endbr64(...) || was_endbr64(...) )
>>   needed += ENDBR64_LEN;
> Looks like I didn't fully understand the problem then from your
> initial description. The adjustment here (and the one needed in
> Björn's patch) is to compensate for the advancing of the
> targets of altcalls past the ENDBR?

No.  Consider this scenario:

callee:
    endbr64
    ...

altcall:
    call *foo(%rip)

During boot, we rewrite altcall to be `call callee+4` and turn endbr64
into nops, so it now looks like:

callee:
    nop4
    ...

altcall:
    call callee+4

Then we want to livepatch callee to callee_new, so we get

callee_new:
    endbr64
    ...

in the livepatch itself.

Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).

The livepatch logic calling is_endbr(callee) doesn't work because it's
now a nop4, which is why it needs a was_endbr64(callee) too.

>
>> --- a/xen/arch/x86/include/asm/endbr.h
>> +++ b/xen/arch/x86/include/asm/endbr.h
>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>  *(uint32_t *)ptr = gen_endbr64();
>>  }
>>  
>> +/*
>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>> + * P6_NOP4.
>> + */
>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
> In case this remains as is - did you mean "opsz" instead of "osp"?
> But this really is "nopw (%rax)" anyway.

Oh, osp is the nasm name.  I'll switch to nopw.

~Andrew

[PATCH v4 1/2] Livepatch: resolve old address before function verification

2022-03-08 Thread Bjoern Doebel

When verifying that a livepatch can be applied, we may as well want to
inspect the target function to be patched. To do so, we need to resolve
this function's address before running the arch-specific
livepatch_verify hook.

Signed-off-by: Bjoern Doebel 
Acked-by: Konrad Rzeszutek Wilk 
Reviewed-by: Ross Lagerwall 
---
 xen/common/livepatch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index ec301a9f12..be2cf75c2d 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -684,11 +684,11 @@ static int prepare_payload(struct payload *payload,
 return -EINVAL;
 }
 
-rc = arch_livepatch_verify_func(f);
+rc = resolve_old_address(f, elf);
 if ( rc )
 return rc;
 
-rc = resolve_old_address(f, elf);
+rc = arch_livepatch_verify_func(f);
 if ( rc )
 return rc;
 
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

[PATCH v4 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Bjoern Doebel

Xen enabled CET for supporting architectures. The control flow aspect of
CET expects functions that can be called indirectly (i.e., via function
pointers) to start with an ENDBR64 instruction. Otherwise a control flow
exception is raised.

This expectation breaks livepatching flows because we patch functions by
overwriting their first 5 bytes with a JMP + , thus breaking the
ENDBR64. We fix this by checking the start of a patched function for
being ENDBR64. In the positive case we move the livepatch JMP to start
behind the ENDBR64 instruction.

To avoid having to guess the ENDBR64 offset again on patch reversal
(which might race with other mechanisms adding/removing ENDBR
dynamically), use the livepatch metadata to store the computed offset
along with the saved bytes of the overwritten function.

Signed-off-by: Bjoern Doebel 
Acked-by: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 

Note that on top of livepatching functions, Xen supports an additional
mode where we can "remove" a function by overwriting it with NOPs. This
is only supported for functions up to 31 bytes in size and this patch
reduces this limit to 30 bytes.

Changes since r1:
* use sizeof_field() to avoid unused variable warning
* make metadata variable const in arch_livepatch_revert
* rebase on top and make use of Andrew Cooper's was_endbr64() patch
---
 xen/arch/x86/livepatch.c | 61 ++--
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 65530c1e57..5380e18bd9 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -14,11 +14,29 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+/*
+ * CET hotpatching support: We may have functions starting with an ENDBR64
+ * instruction that MUST remain the first instruction of the function, hence
+ * we need to move any hotpatch trampoline further into the function. For that
+ * we need to keep track of the patching offset used for any loaded hotpatch
+ * (to avoid racing against other fixups adding/removing ENDBR64 or similar
+ * instructions).
+ *
+ * We do so by making use of the existing opaque metadata area. We use its
+ * first 4 bytes to track the offset into the function used for patching and
+ * the remainder of the data to store overwritten code bytes.
+ */
+struct x86_livepatch_meta {
+uint8_t patch_offset;
+uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
+};
+
 static bool has_active_waitqueue(const struct vm_event_domain *ved)
 {
 /* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */
@@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void)
 
 int arch_livepatch_verify_func(const struct livepatch_func *func)
 {
+BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE);
+
 /* If NOPing.. */
 if ( !func->new_addr )
 {
 /* Only do up to maximum amount we can put in the ->opaque. */
-if ( func->new_size > sizeof(func->opaque) )
+if ( func->new_size > sizeof_field(struct x86_livepatch_meta,
+   instruction) )
 return -EOPNOTSUPP;
 
 if ( func->old_size < func->new_size )
 return -EINVAL;
 }
-else if ( func->old_size < ARCH_PATCH_INSN_SIZE )
-return -EINVAL;
+else
+{
+/*
+ * Space needed now depends on whether the target function
+ * starts with an ENDBR64 instruction.
+ */
+uint8_t needed;
+
+needed = ARCH_PATCH_INSN_SIZE;
+if ( is_endbr64(func->old_addr) || was_endbr64(func->old_addr) )
+needed += ENDBR64_LEN;
+
+if ( func->old_size < needed )
+return -EINVAL;
+}
 
 return 0;
 }
@@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct 
livepatch_func *func)
 void noinline arch_livepatch_apply(struct livepatch_func *func)
 {
 uint8_t *old_ptr;
-uint8_t insn[sizeof(func->opaque)];
+struct x86_livepatch_meta *lp;
+uint8_t insn[sizeof(lp->instruction)];
 unsigned int len;
 
+lp = (struct x86_livepatch_meta *)func->opaque;
+lp->patch_offset = 0;
 old_ptr = func->old_addr;
 len = livepatch_insn_len(func);
 if ( !len )
 return;
 
-memcpy(func->opaque, old_ptr, len);
+if ( is_endbr64(old_ptr) )
+lp->patch_offset += ENDBR64_LEN;
+
+memcpy(lp->instruction, old_ptr + lp->patch_offset, len);
 if ( func->new_addr )
 {
 int32_t val;
@@ -143,14 +183,15 @@ void noinline arch_livepatch_apply(struct livepatch_func 
*func)
 BUILD_BUG_ON(ARCH_PATCH_INSN_SIZE != (1 + sizeof(val)));
 
 insn[0] = 0xe9; /* Relative jump. */
-val = func->new_addr - func->old_addr - ARCH_PATCH_INSN_SIZE;
+val = func->new_addr - (func->old_addr + lp->patch_offset
++ ARCH_PATCH_INSN_SIZE);
 
 memcpy(&insn[1], &val, sizeof(val));
 }
 e

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Ross Lagerwall

> From: Bjoern Doebel 
> Sent: Tuesday, March 8, 2022 10:29 AM
> To: xen-devel@lists.xenproject.org 
> Cc: Michael Kurth ; Martin Pohlack ; 
> Roger Pau Monne ; Andrew Cooper 
> ; Bjoern Doebel ; Konrad 
> Rzeszutek Wilk ; Ross Lagerwall 
> 
> Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
> functions 
>  
> Xen enabled CET for supporting architectures. The control flow aspect of
> CET expects functions that can be called indirectly (i.e., via function
> pointers) to start with an ENDBR64 instruction. Otherwise a control flow
> exception is raised.
> 
> This expectation breaks livepatching flows because we patch functions by
> overwriting their first 5 bytes with a JMP + , thus breaking the
> ENDBR64. We fix this by checking the start of a patched function for
> being ENDBR64. In the positive case we move the livepatch JMP to start
> behind the ENDBR64 instruction.
> 
> To avoid having to guess the ENDBR64 offset again on patch reversal
> (which might race with other mechanisms adding/removing ENDBR
> dynamically), use the livepatch metadata to store the computed offset
> along with the saved bytes of the overwritten function.
> 
> Signed-off-by: Bjoern Doebel 
> CC: Konrad Rzeszutek Wilk 
> CC: Ross Lagerwall 
> 
> Note that on top of livepatching functions, Xen supports an additional
> mode where we can "remove" a function by overwriting it with NOPs. This
> is only supported for functions up to 31 bytes in size and this patch
> reduces this limit to 30 bytes.
> 
> Changes since r1:
> * use sizeof_field() to avoid unused variable warning
> * make metadata variable const in arch_livepatch_revert
> ---
>  xen/arch/x86/livepatch.c | 61 ++--
>  1 file changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
> index 65530c1e57..0fd97f2a00 100644
> --- a/xen/arch/x86/livepatch.c
> +++ b/xen/arch/x86/livepatch.c
> @@ -14,11 +14,29 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  
> +/*
> + * CET hotpatching support: We may have functions starting with an ENDBR64
> + * instruction that MUST remain the first instruction of the function, hence
> + * we need to move any hotpatch trampoline further into the function. For 
> that
> + * we need to keep track of the patching offset used for any loaded hotpatch
> + * (to avoid racing against other fixups adding/removing ENDBR64 or similar
> + * instructions).
> + *
> + * We do so by making use of the existing opaque metadata area. We use its
> + * first 4 bytes to track the offset into the function used for patching and
> + * the remainder of the data to store overwritten code bytes.
> + */
> +struct x86_livepatch_meta {
> +uint8_t patch_offset;
> +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
> +};
> +

I think it would make things a bit simpler to use one of the spare _pad bits
from struct livepatch_func  ather than hacking it into the opaque area. Is
there a reason you chose this approach?

[PATCH 4.16] VT-d: drop undue address-of from check_cleanup_domid_map()

2022-03-08 Thread Jan Beulich

For an unknown reason I added back the operator while backporting,
despite 4.16 having c06e3d810314 ("VT-d: per-domain IOMMU bitmap needs
to have dynamic size"). I can only assume that I mistakenly took the
4.15 backport as basis and/or reference.

Fixes: fa45f6b5560e ("VT-d: split domid map cleanup check into a function")
Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -197,7 +197,7 @@ static void check_cleanup_domid_map(stru
 
 if ( !found )
 {
-clear_bit(iommu->index, &dom_iommu(d)->arch.vtd.iommu_bitmap);
+clear_bit(iommu->index, dom_iommu(d)->arch.vtd.iommu_bitmap);
 cleanup_domid_map(d, iommu);
 }
 }

Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber

2022-03-08 Thread Jan Beulich

On 08.03.2022 16:19, Andrew Cooper wrote:
> On 08/03/2022 14:37, Jan Beulich wrote:
>> On 08.03.2022 15:01, Andrew Cooper wrote:
>>> For livepatching, we need to look at a potentially clobbered function and
>>> determine whether it used to have an ENDBR64 instruction.
>>>
>>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and 
>>> introduce
>>> the was_endbr64() predicate.
>> Did you consider using ENDBR32 for this purpose?
> 
> No, and no because that's very short sighted.  Even 4 non-nops would be
> better than ENDBR32, because they wouldn't create actually-usable
> codepaths in corner cases we care to exclude.

Well - I thought of ENDBR32 because elsewhere you said we don't
need to be worried about any byte sequence resembling that insn,
since for it to become "usable" an attacker would first need to
have managed to manufacture a 32-bit ring0 code segment. Now you
say effectively the opposite.

>> I'm worried that
>> the pattern you picked is still too close to what might be used
>> (down the road) by a tool chain.
> 
> This is what Linux are doing too, but no - I'm not worried.  The
> encoding isn't the only protection; toolchains also have no reason to
> put a nop4 at the head of functions; nop5 is the common one to find.

Well, okay - let's hope for the best then.

>>> Bjoern: For the livepatching code, I think you want:
>>>
>>>   if ( is_endbr64(...) || was_endbr64(...) )
>>>   needed += ENDBR64_LEN;
>> Looks like I didn't fully understand the problem then from your
>> initial description. The adjustment here (and the one needed in
>> Björn's patch) is to compensate for the advancing of the
>> targets of altcalls past the ENDBR?
> 
> No.  Consider this scenario:
> 
> callee:
>     endbr64
>     ...
> 
> altcall:
>     call *foo(%rip)
> 
> During boot, we rewrite altcall to be `call callee+4` and turn endbr64
> into nops, so it now looks like:
> 
> callee:
>     nop4
>     ...
> 
> altcall:
>     call callee+4
> 
> Then we want to livepatch callee to callee_new, so we get
> 
> callee_new:
>     endbr64
>     ...
> 
> in the livepatch itself.
> 
> Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).
> 
> The livepatch logic calling is_endbr(callee) doesn't work because it's
> now a nop4, which is why it needs a was_endbr64(callee) too.

Sounds like exactly what I was thinking of. Perhaps my description
wasn't sufficiently clear / unambiguous then.

>>> --- a/xen/arch/x86/include/asm/endbr.h
>>> +++ b/xen/arch/x86/include/asm/endbr.h
>>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
>>>  *(uint32_t *)ptr = gen_endbr64();
>>>  }
>>>  
>>> +/*
>>> + * After clobbering ENDBR64, we may need to confirm that the site used to
>>> + * contain an ENDBR64 instruction.  Use an encoding which isn't the default
>>> + * P6_NOP4.
>>> + */
>>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>> In case this remains as is - did you mean "opsz" instead of "osp"?
>> But this really is "nopw (%rax)" anyway.
> 
> Oh, osp is the nasm name.  I'll switch to nopw.

Reviewed-by: Jan Beulich 

Jan

[RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Bjoern Doebel

Comparing a signed 64bit integer to a signed 32 bit integer may lead to
unexpected overflows. Adjust the cast to use the same type.

Signed-off-by: Bjoern Doebel 
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 
---
I need some input here. When testing the CET-BIT livepatch updates I
noticed that my generated livepatch would not load due to

(XEN) livepatch: vmx: Overflow in relocation 1 in .rela.altinstructions for 
.altinstructions

A deeper look revealed that the ELF relocation adjustment seems to be
going wrong and that in fact the lower 32bit of the compared values in
my case were identical, but that the cast to int64_t for the value
pulled in extra 32 bits, which turned out to be different.

Applying this patch fixed the issue for my example and I got a fully
working livepatch. However, I do not understand what is actually going
on here, so I'm sending this RFC to get extra eyes on the code.
---
 xen/arch/x86/livepatch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 59620b8a4f..5380e18bd9 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 
 val -= (uint64_t)dest;
 *(int32_t *)dest = val;
-if ( (int64_t)val != *(int32_t *)dest )
+if ( (int32_t)val != *(int32_t *)dest )
 {
 printk(XENLOG_ERR LIVEPATCH "%s: Overflow in relocation %u in 
%s for %s\n",
elf->name, i, rela->name, base->name);
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH v4 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Jan Beulich

On 08.03.2022 16:23, Bjoern Doebel wrote:
> @@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct 
> livepatch_func *func)
>  void noinline arch_livepatch_apply(struct livepatch_func *func)
>  {
>  uint8_t *old_ptr;
> -uint8_t insn[sizeof(func->opaque)];
> +struct x86_livepatch_meta *lp;
> +uint8_t insn[sizeof(lp->instruction)];
>  unsigned int len;
>  
> +lp = (struct x86_livepatch_meta *)func->opaque;
> +lp->patch_offset = 0;
>  old_ptr = func->old_addr;
>  len = livepatch_insn_len(func);
>  if ( !len )
>  return;
>  
> -memcpy(func->opaque, old_ptr, len);
> +if ( is_endbr64(old_ptr) )
> +lp->patch_offset += ENDBR64_LEN;

Don't you need to use was_endbr64() here as well?

Jan

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Doebel, Bjoern




On 08.03.22 16:25, Ross Lagerwall wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.




From: Bjoern Doebel 
Sent: Tuesday, March 8, 2022 10:29 AM
To: xen-devel@lists.xenproject.org 
Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne 
; Andrew Cooper ; Bjoern Doebel ; 
Konrad Rzeszutek Wilk ; Ross Lagerwall 
Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
functions

Xen enabled CET for supporting architectures. The control flow aspect of
CET expects functions that can be called indirectly (i.e., via function
pointers) to start with an ENDBR64 instruction. Otherwise a control flow
exception is raised.

This expectation breaks livepatching flows because we patch functions by
overwriting their first 5 bytes with a JMP + , thus breaking the
ENDBR64. We fix this by checking the start of a patched function for
being ENDBR64. In the positive case we move the livepatch JMP to start
behind the ENDBR64 instruction.

To avoid having to guess the ENDBR64 offset again on patch reversal
(which might race with other mechanisms adding/removing ENDBR
dynamically), use the livepatch metadata to store the computed offset
along with the saved bytes of the overwritten function.

Signed-off-by: Bjoern Doebel 
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 

Note that on top of livepatching functions, Xen supports an additional
mode where we can "remove" a function by overwriting it with NOPs. This
is only supported for functions up to 31 bytes in size and this patch
reduces this limit to 30 bytes.

Changes since r1:
* use sizeof_field() to avoid unused variable warning
* make metadata variable const in arch_livepatch_revert
---
  xen/arch/x86/livepatch.c | 61 ++--
  1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 65530c1e57..0fd97f2a00 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -14,11 +14,29 @@
  #include 
  #include 

+#include 
  #include 
  #include 
  #include 
  #include 

+/*
+ * CET hotpatching support: We may have functions starting with an ENDBR64
+ * instruction that MUST remain the first instruction of the function, hence
+ * we need to move any hotpatch trampoline further into the function. For that
+ * we need to keep track of the patching offset used for any loaded hotpatch
+ * (to avoid racing against other fixups adding/removing ENDBR64 or similar
+ * instructions).
+ *
+ * We do so by making use of the existing opaque metadata area. We use its
+ * first 4 bytes to track the offset into the function used for patching and
+ * the remainder of the data to store overwritten code bytes.
+ */
+struct x86_livepatch_meta {
+uint8_t patch_offset;
+uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
+};
+


I think it would make things a bit simpler to use one of the spare _pad bits
from struct livepatch_func  ather than hacking it into the opaque area. Is
there a reason you chose this approach?


No specific reason. Are you suggesting updating the public livepatch 
interface to add a new member and reduce the padding size by 1 byte? I 
guess that will also require a patch to livepatch-build-tools?


Bjoern



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jane Malalane

On 08/03/2022 11:38, Roger Pau Monné wrote:
> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
>> Introduce a new per-domain creation x86 specific flag to
>> select whether hardware assisted virtualization should be used for
>> x{2}APIC.
>>
>> A per-domain option is added to xl in order to select the usage of
>> x{2}APIC hardware assisted virtualization, as well as a global
>> configuration option.
>>
>> Having all APIC interaction exit to Xen for emulation is slow and can
>> induce much overhead. Hardware can speed up x{2}APIC by decoding the
>> APIC access and providing a VM exit with a more specific exit reason
>> than a regular EPT fault or by altogether avoiding a VM exit.
>>
>> On the other hand, being able to disable x{2}APIC hardware assisted
>> virtualization can be useful for testing and debugging purposes.
>>
>> Note: vmx_install_vlapic_mapping doesn't require modifications
>> regardless of whether the guest has "Virtualize APIC accesses" enabled
>> or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long
>> as virtualize_apic_accesses is supported by the CPU.
>>
>> Suggested-by: Andrew Cooper 
>> Signed-off-by: Jane Malalane 
>> ---
>> CC: Wei Liu 
>> CC: Anthony PERARD 
>> CC: Juergen Gross 
>> CC: Andrew Cooper 
>> CC: George Dunlap 
>> CC: Jan Beulich 
>> CC: Julien Grall 
>> CC: Stefano Stabellini 
>> CC: Christian Lindig 
>> CC: David Scott 
>> CC: Volodymyr Babchuk 
>> CC: "Roger Pau Monné" 
>>
>> v5:
>> * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of
>>the has_assisted_x{2}apic macros
>> * Following changes in assisted_x{2}apic_available definitions in
>>patch 1, retighten conditionals for setting
>>XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in
>>cpuid_hypervisor_leaves()
>>
>> v4:
>>   * Add has_assisted_x{2}apic macros and use them where appropriate
>>   * Replace CPU checks with per-domain assisted_x{2}apic control
>> options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(),
>> following edits to assisted_x{2}apic_available definitions in
>> patch 1
>> Note: new assisted_x{2}apic_available definitions make later
>> cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery
>> checks redundant in vmx_vlapic_msr_changed()
>>
>> v3:
>>   * Change info in xl.cfg to better express reality and fix
>> capitalization of x{2}apic
>>   * Move "physinfo" variable definition to the beggining of
>> libxl__domain_build_info_setdefault()
>>   * Reposition brackets in if statement to match libxl coding style
>>   * Shorten logic in libxl__arch_domain_build_info_setdefault()
>>   * Correct dprintk message in arch_sanitise_domain_config()
>>   * Make appropriate changes in vmx_vlapic_msr_changed() and
>> cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit
>>   * Remove unneeded parantheses
>>
>> v2:
>>   * Add a LIBXL_HAVE_ASSISTED_APIC macro
>>   * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
>>   * Add a return statement in now "int"
>> libxl__arch_domain_build_info_setdefault()
>>   * Preserve libxl__arch_domain_build_info_setdefault 's location in
>> libxl_create.c
>>   * Correct x{2}apic default setting logic in
>> libxl__arch_domain_prepare_config()
>>   * Correct logic for parsing assisted_x{2}apic host/guest options in
>> xl_parse.c and initialize them to -1 in xl.c
>>   * Use guest options directly in vmx_vlapic_msr_changed
>>   * Fix indentation of bool assisted_x{2}apic in struct hvm_domain
>>   * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks
>> ---
>>   docs/man/xl.cfg.5.pod.in| 19 +++
>>   docs/man/xl.conf.5.pod.in   | 12 
>>   tools/golang/xenlight/helpers.gen.go| 12 
>>   tools/golang/xenlight/types.gen.go  |  2 ++
>>   tools/include/libxl.h   |  7 +++
>>   tools/libs/light/libxl_arch.h   |  5 +++--
>>   tools/libs/light/libxl_arm.c|  7 +--
>>   tools/libs/light/libxl_create.c | 22 +-
>>   tools/libs/light/libxl_types.idl|  2 ++
>>   tools/libs/light/libxl_x86.c| 28 ++--
>>   tools/ocaml/libs/xc/xenctrl.ml  |  2 ++
>>   tools/ocaml/libs/xc/xenctrl.mli |  2 ++
>>   tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
>>   tools/xl/xl.c   |  8 
>>   tools/xl/xl.h   |  2 ++
>>   tools/xl/xl_parse.c | 16 
>>   xen/arch/x86/domain.c   | 28 +++-
>>   xen/arch/x86/hvm/vmx/vmcs.c |  4 
>>   xen/arch/x86/hvm/vmx/vmx.c  | 13 -
>>   xen/arch/x86/include/asm/hvm/domain.h   |  6 ++
>>   xen/arch/x86/include/asm/hvm/vmx/vmcs.h |  3 +++
>>   xen/arch/x86/traps.c|  9 +
>>   xen/include/public/arch-x86/xen.h   |  2 ++
>>

Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Jan Beulich

On 08.03.2022 16:36, Bjoern Doebel wrote:
> --- a/xen/arch/x86/livepatch.c
> +++ b/xen/arch/x86/livepatch.c
> @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
>  
>  val -= (uint64_t)dest;
>  *(int32_t *)dest = val;

Afaict after this assignment ...

> -if ( (int64_t)val != *(int32_t *)dest )
> +if ( (int32_t)val != *(int32_t *)dest )

... this condition can never be false. The cast really wants to be
to int64_t, and the overflow you saw being reported is quite likely
for a different reason. But from the sole message you did quote
it's not really possible to figure what else is wrong.

Jan

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Ross Lagerwall

> From: Doebel, Bjoern 
> Sent: Tuesday, March 8, 2022 3:41 PM
> To: Ross Lagerwall ; 
> xen-devel@lists.xenproject.org 
> Cc: Michael Kurth ; Martin Pohlack ; 
> Roger Pau Monne ; Andrew Cooper 
> ; Konrad Rzeszutek Wilk 
> Subject: Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
> functions 
>  
> 
> On 08.03.22 16:25, Ross Lagerwall wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> > 
> > 
> > 
> >> From: Bjoern Doebel 
> >> Sent: Tuesday, March 8, 2022 10:29 AM
> >> To: xen-devel@lists.xenproject.org 
> >> Cc: Michael Kurth ; Martin Pohlack ; 
> >> Roger Pau Monne ; Andrew Cooper 
> >> ; Bjoern Doebel ; Konrad 
> >> Rzeszutek Wilk ; Ross Lagerwall 
> >> 
> >> Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
> >> functions
> >>
> >> Xen enabled CET for supporting architectures. The control flow aspect of
> >> CET expects functions that can be called indirectly (i.e., via function
> >> pointers) to start with an ENDBR64 instruction. Otherwise a control flow
> >> exception is raised.
> >>
> >> This expectation breaks livepatching flows because we patch functions by
> >> overwriting their first 5 bytes with a JMP + , thus breaking the
> >> ENDBR64. We fix this by checking the start of a patched function for
> >> being ENDBR64. In the positive case we move the livepatch JMP to start
> >> behind the ENDBR64 instruction.
> >>
> >> To avoid having to guess the ENDBR64 offset again on patch reversal
> >> (which might race with other mechanisms adding/removing ENDBR
> >> dynamically), use the livepatch metadata to store the computed offset
> >> along with the saved bytes of the overwritten function.
> >>
> >> Signed-off-by: Bjoern Doebel 
> >> CC: Konrad Rzeszutek Wilk 
> >> CC: Ross Lagerwall 
> >> 
> >> Note that on top of livepatching functions, Xen supports an additional
> >> mode where we can "remove" a function by overwriting it with NOPs. This
> >> is only supported for functions up to 31 bytes in size and this patch
> >> reduces this limit to 30 bytes.
> >>
> >> Changes since r1:
> >> * use sizeof_field() to avoid unused variable warning
> >> * make metadata variable const in arch_livepatch_revert
> >> ---
> >>   xen/arch/x86/livepatch.c | 61 ++--
> >>   1 file changed, 53 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
> >> index 65530c1e57..0fd97f2a00 100644
> >> --- a/xen/arch/x86/livepatch.c
> >> +++ b/xen/arch/x86/livepatch.c
> >> @@ -14,11 +14,29 @@
> >>   #include 
> >>   #include 
> >>
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >>   #include 
> >>
> >> +/*
> >> + * CET hotpatching support: We may have functions starting with an ENDBR64
> >> + * instruction that MUST remain the first instruction of the function, 
> >> hence
> >> + * we need to move any hotpatch trampoline further into the function. For 
> >> that
> >> + * we need to keep track of the patching offset used for any loaded 
> >> hotpatch
> >> + * (to avoid racing against other fixups adding/removing ENDBR64 or 
> >> similar
> >> + * instructions).
> >> + *
> >> + * We do so by making use of the existing opaque metadata area. We use its
> >> + * first 4 bytes to track the offset into the function used for patching 
> >> and
> >> + * the remainder of the data to store overwritten code bytes.
> >> + */
> >> +struct x86_livepatch_meta {
> >> +uint8_t patch_offset;
> >> +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
> >> +};
> >> +
> > 
> > I think it would make things a bit simpler to use one of the spare _pad bits
> > from struct livepatch_func  ather than hacking it into the opaque area. Is
> > there a reason you chose this approach?
> 
> No specific reason. Are you suggesting updating the public livepatch 
> interface to add a new member and reduce the padding size by 1 byte? I 
> guess that will also require a patch to livepatch-build-tools?
> 
> Bjoern

struct livepatch_func contains info that the build tool needs to
communicate to Xen as well as space for Xen's internal book keeping
while the live patch is applied. This includes the array for storing
instructions, the applied flag, and now additionally the patch offset.
(It's a somewhat odd arrangement but it's what we've got...)

The build tool does not need to know the details about any of Xen's internal
book keeping. So my preference would be to have patch_offset as an additional
member next to applied (reducing padding by 1) and then in livepatch-build-tools
replace:

unsigned char pad[MAX_REPLACEMENT_SIZE];
uint8_t applied;
uint8_t _pad[7];

with simply:

uint8_t opaque[39];


What do you think?

Ross

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote:
> On 08/03/2022 11:38, Roger Pau Monné wrote:
> > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
> >> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h 
> >> b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> >> index 9119aa8536..5b7d662ed7 100644
> >> --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> >> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
> >> @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v);
> >>   #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000
> >>   extern u32 vmx_cpu_based_exec_control;
> >>   
> >> +#define has_assisted_xapic(d)   ((d)->arch.hvm.assisted_xapic)
> >> +#define has_assisted_x2apic(d)  ((d)->arch.hvm.assisted_x2apic)
> > 
> > Those macros should not be in an Intel specific header,
> > arch/x86/include/asm/hvm/domain.h is likely a better place.
> 
> Actually do you think hvm.h could be better?

I guess that's also fine, I did suggest hvm/domain.h because that's
where the fields get declared. I guess you prefer hvm.h because there
are other HVM related helpers in there?

Thanks, Roger.

Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber

2022-03-08 Thread Andrew Cooper

On 08/03/2022 15:36, Jan Beulich wrote:
> On 08.03.2022 16:19, Andrew Cooper wrote:
>> On 08/03/2022 14:37, Jan Beulich wrote:
>>> On 08.03.2022 15:01, Andrew Cooper wrote:
 For livepatching, we need to look at a potentially clobbered function and
 determine whether it used to have an ENDBR64 instruction.

 Use a non-default 4-byte P6 long nop, not emitted by toolchains, and 
 introduce
 the was_endbr64() predicate.
>>> Did you consider using ENDBR32 for this purpose?
>> No, and no because that's very short sighted.  Even 4 non-nops would be
>> better than ENDBR32, because they wouldn't create actually-usable
>> codepaths in corner cases we care to exclude.
> Well - I thought of ENDBR32 because elsewhere you said we don't
> need to be worried about any byte sequence resembling that insn,
> since for it to become "usable" an attacker would first need to
> have managed to manufacture a 32-bit ring0 code segment. Now you
> say effectively the opposite.

We've got ~0 risk of having any embedded ENDBR32's because we never
refer to the opcode, and therefore adding 2x 0.7s delay to scan the
binary on build is a waste.  If the check were free, it would be a
different matter.

At any point, if we were to introduce references to ENDBR32, we'd want
to start scanning for embedded sequences.

But at no point do we want to intentionally remove our defence in depth
created by both having no CS32 code segment, and no ENDBR32 instructions.

>
>>> I'm worried that
>>> the pattern you picked is still too close to what might be used
>>> (down the road) by a tool chain.
>> This is what Linux are doing too, but no - I'm not worried.  The
>> encoding isn't the only protection; toolchains also have no reason to
>> put a nop4 at the head of functions; nop5 is the common one to find.
> Well, okay - let's hope for the best then.
>
 Bjoern: For the livepatching code, I think you want:

   if ( is_endbr64(...) || was_endbr64(...) )
   needed += ENDBR64_LEN;
>>> Looks like I didn't fully understand the problem then from your
>>> initial description. The adjustment here (and the one needed in
>>> Björn's patch) is to compensate for the advancing of the
>>> targets of altcalls past the ENDBR?
>> No.  Consider this scenario:
>>
>> callee:
>>     endbr64
>>     ...
>>
>> altcall:
>>     call *foo(%rip)
>>
>> During boot, we rewrite altcall to be `call callee+4` and turn endbr64
>> into nops, so it now looks like:
>>
>> callee:
>>     nop4
>>     ...
>>
>> altcall:
>>     call callee+4
>>
>> Then we want to livepatch callee to callee_new, so we get
>>
>> callee_new:
>>     endbr64
>>     ...
>>
>> in the livepatch itself.
>>
>> Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5).
>>
>> The livepatch logic calling is_endbr(callee) doesn't work because it's
>> now a nop4, which is why it needs a was_endbr64(callee) too.
> Sounds like exactly what I was thinking of. Perhaps my description
> wasn't sufficiently clear / unambiguous then.

Ah yes - I think I did misinterpret what you wrote.  I hope everything
is clear now.

>
 --- a/xen/arch/x86/include/asm/endbr.h
 +++ b/xen/arch/x86/include/asm/endbr.h
 @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr)
  *(uint32_t *)ptr = gen_endbr64();
  }
  
 +/*
 + * After clobbering ENDBR64, we may need to confirm that the site used to
 + * contain an ENDBR64 instruction.  Use an encoding which isn't the 
 default
 + * P6_NOP4.
 + */
 +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */
>>> In case this remains as is - did you mean "opsz" instead of "osp"?
>>> But this really is "nopw (%rax)" anyway.
>> Oh, osp is the nasm name.  I'll switch to nopw.
> Reviewed-by: Jan Beulich 

Thanks.

~Andrew

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jane Malalane

On 08/03/2022 16:02, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote:
>> On 08/03/2022 11:38, Roger Pau Monné wrote:
>>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
 diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h 
 b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 index 9119aa8536..5b7d662ed7 100644
 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v);
#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000
extern u32 vmx_cpu_based_exec_control;

 +#define has_assisted_xapic(d)   ((d)->arch.hvm.assisted_xapic)
 +#define has_assisted_x2apic(d)  ((d)->arch.hvm.assisted_x2apic)
>>>
>>> Those macros should not be in an Intel specific header,
>>> arch/x86/include/asm/hvm/domain.h is likely a better place.
>>
>> Actually do you think hvm.h could be better?
> 
> I guess that's also fine, I did suggest hvm/domain.h because that's
> where the fields get declared. I guess you prefer hvm.h because there
> are other HVM related helpers in there?

Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h, for 
e.g.
Thank you for pointing this out,

Jane.

Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote:
> On 08.03.2022 16:36, Bjoern Doebel wrote:
> > --- a/xen/arch/x86/livepatch.c
> > +++ b/xen/arch/x86/livepatch.c
> > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf 
> > *elf,
> >  
> >  val -= (uint64_t)dest;
> >  *(int32_t *)dest = val;
> 
> Afaict after this assignment ...
> 
> > -if ( (int64_t)val != *(int32_t *)dest )
> > +if ( (int32_t)val != *(int32_t *)dest )
> 
> ... this condition can never be false. The cast really wants to be
> to int64_t, and the overflow you saw being reported is quite likely
> for a different reason. But from the sole message you did quote
> it's not really possible to figure what else is wrong.

It seems Linux has that check ifdef'ed [0], but there's no reference
as to why it's that way (I've tracked it back to the x86-64 import on
the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076).

It's a 64bit relocation using a 32bit value, but it's unclear to me
that modifying the top 32bits is not allowed/intended.

Regards, Roger.

[0] https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/module.c#L190

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jane Malalane

On 08/03/2022 16:02, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote:
>> On 08/03/2022 11:38, Roger Pau Monné wrote:
>>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
 diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h 
 b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 index 9119aa8536..5b7d662ed7 100644
 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
 @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v);
#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000
extern u32 vmx_cpu_based_exec_control;

 +#define has_assisted_xapic(d)   ((d)->arch.hvm.assisted_xapic)
 +#define has_assisted_x2apic(d)  ((d)->arch.hvm.assisted_x2apic)
>>>
>>> Those macros should not be in an Intel specific header,
>>> arch/x86/include/asm/hvm/domain.h is likely a better place.
>>
>> Actually do you think hvm.h could be better?
> 
> I guess that's also fine, I did suggest hvm/domain.h because that's
> where the fields get declared. I guess you prefer hvm.h because there
> are other HVM related helpers in there?

Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h for e.g.

Thanks again,

Jane.

Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active

2022-03-08 Thread Andrew Cooper

On 08/03/2022 08:15, Jan Beulich wrote:
> On 07.03.2022 21:53, Andrew Cooper wrote:
>> --- a/xen/arch/x86/machine_kexec.c
>> +++ b/xen/arch/x86/machine_kexec.c
>> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image)
>>   */
>>  local_irq_disable();
>>  
>> +/* Reset CPUID masking and faulting to the host's default. */
>> +ctxt_switch_levelling(NULL);
>> +
>> +/* Disable CET. */
>> +if ( read_cr4() & X86_CR4_CET )
>> +{
>> +wrmsrl(MSR_S_CET, 0);
>> +write_cr4(read_cr4() & ~X86_CR4_CET);
>> +}
>> +
>>  /* Now regular interrupts are disabled, we need to reduce the impact
>>   * of interrupts not disabled by 'cli'.
>>   *
> Besides introducing somewhat of a disconnect between the comment in
> context here and the earlier local_irq_disable(), is it really
> necessary to do both actions with IRQs off?

We are a handful of instructions away from discarding Xen's context
entirely.  IRQs are not a relevant concern.

If we're nitpicking, irqs want to be off before kexecing gets set,
because absolutely nothing good can come of handling interrupts later
than that point.

~Andrew

Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 04:16:21PM +, Jane Malalane wrote:
> On 08/03/2022 16:02, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote:
> >> On 08/03/2022 11:38, Roger Pau Monné wrote:
> >>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote:
>  diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h 
>  b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
>  index 9119aa8536..5b7d662ed7 100644
>  --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
>  +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
>  @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v);
> #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000
> extern u32 vmx_cpu_based_exec_control;
> 
>  +#define has_assisted_xapic(d)   ((d)->arch.hvm.assisted_xapic)
>  +#define has_assisted_x2apic(d)  ((d)->arch.hvm.assisted_x2apic)
> >>>
> >>> Those macros should not be in an Intel specific header,
> >>> arch/x86/include/asm/hvm/domain.h is likely a better place.
> >>
> >> Actually do you think hvm.h could be better?
> > 
> > I guess that's also fine, I did suggest hvm/domain.h because that's
> > where the fields get declared. I guess you prefer hvm.h because there
> > are other HVM related helpers in there?
> 
> Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h for e.g.

I'm fine with placing it in hvm.h.

Thanks, Roger.

Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote:
> > On 08.03.2022 16:36, Bjoern Doebel wrote:
> > > --- a/xen/arch/x86/livepatch.c
> > > +++ b/xen/arch/x86/livepatch.c
> > > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf 
> > > *elf,
> > >  
> > >  val -= (uint64_t)dest;
> > >  *(int32_t *)dest = val;
> > 
> > Afaict after this assignment ...
> > 
> > > -if ( (int64_t)val != *(int32_t *)dest )
> > > +if ( (int32_t)val != *(int32_t *)dest )
> > 
> > ... this condition can never be false. The cast really wants to be
> > to int64_t, and the overflow you saw being reported is quite likely
> > for a different reason. But from the sole message you did quote
> > it's not really possible to figure what else is wrong.
> 
> It seems Linux has that check ifdef'ed [0], but there's no reference
> as to why it's that way (I've tracked it back to the x86-64 import on
> the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076).
> 
> It's a 64bit relocation using a 32bit value, but it's unclear to me
> that modifying the top 32bits is not allowed/intended.

Urg, I've worded this very badly. It's a 64bit relocation using a
32bit value, but it's unclear to me that modifying the top 32bits is
not allowed/intended and fine to be dropped.

Thanks, Roger.

Re: [PATCH 4.16] VT-d: drop undue address-of from check_cleanup_domid_map()

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 04:27:00PM +0100, Jan Beulich wrote:
> For an unknown reason I added back the operator while backporting,
> despite 4.16 having c06e3d810314 ("VT-d: per-domain IOMMU bitmap needs
> to have dynamic size"). I can only assume that I mistakenly took the
> 4.15 backport as basis and/or reference.
> 
> Fixes: fa45f6b5560e ("VT-d: split domid map cleanup check into a function")
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 04:08:53PM +0100, Jan Beulich wrote:
> On 08.03.2022 15:18, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote:
> >> On 08.03.2022 14:49, Roger Pau Monne wrote:
> >>> So it can be explicitly placed ahead of the rest of the .text content
> >>> in the linker script (and thus the resulting image). This is a
> >>> prerequisite for further work that will add a catch-all to the text
> >>> section (.text.*).
> >>>
> >>> Note that placement of the sections inside of .text is also slightly
> >>> adjusted to be more similar to the position found in the default GNU
> >>> ld linker script.
> >>>
> >>> The special handling of the object file containing the header data as
> >>> the first object file passed to the linker command line can also be
> >>> removed.
> >>>
> >>> While there also remove the special handling of efi/ on x86. There's
> >>> no need for the resulting object file to be passed in any special
> >>> order to the linker.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Looks good to me, but I have one question before feeling ready to
> >> offer R-b:
> >>
> >>> @@ -86,8 +84,13 @@ SECTIONS
> >>> *(.text.kexec)  /* Page aligned in the object file. */
> >>> kexec_reloc_end = .;
> >>>  
> >>> -   *(.text.cold)
> >>> -   *(.text.unlikely)
> >>> +   *(.text.cold .text.cold.*)
> >>> +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> >>
> >> What generates .text.*_unlikely? And if anything really does, why
> >> would .text.cold not have a similar equivalent?
> > 
> > That matches what I saw in the default linker script from my version
> > of GNU ld:
> > 
> > *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> > 
> > I really don't know what could generate .text.*_unlikely, but since
> > it's part of the default linker script I assumed it was better to just
> > add it.
> 
> I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from
> 4.6.x. onwards it would be .text.unlikely.*.
> 
> As to the dissimilarity with .text.cold: I wonder why we have that in
> the first place. It matches our __cold attribute, just that we don't
> use that anywhere afaics.
> 
> In any event:
> Reviewed-by: Jan Beulich 
> albeit preferably with .text.cold.* dropped again.

Would you mind dropping the .text.cold.* at commit? Otherwise I can
resend.

Thanks, Roger.

Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Roger Pau Monné

On Tue, Mar 08, 2022 at 04:13:55PM +0100, Jan Beulich wrote:
> On 08.03.2022 15:46, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote:
> >> On 08.03.2022 14:49, Roger Pau Monne wrote:
> >>> If livepatching support is enabled build the hypervisor with
> >>> -f{function,data}-sections compiler options, which is required by the
> >>> livepatching tools to detect changes and create livepatches.
> >>>
> >>> This shouldn't result in any functional change on the hypervisor
> >>> binary image, but does however require some changes in the linker
> >>> script in order to handle that each function and data item will now be
> >>> placed into its own section in object files. As a result add catch-all
> >>> for .text, .data and .bss in order to merge each individual item
> >>> section into the final image.
> >>>
> >>> The main difference will be that .text.startup will end up being part
> >>> of .text rather than .init, and thus won't be freed. .text.exit will
> >>> also be part of .text rather than dropped. Overall this could make the
> >>> image bigger, and package some .text code in a sub-optimal way.
> >>>
> >>> On Arm the .data.read_mostly needs to be moved ahead of the .data
> >>> section like it's already done on x86, so the .data.* catch-all
> >>> doesn't also include .data.read_mostly. The alignment of
> >>> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
> >>> up being placed at the tail of a read-only page from the previous
> >>> section. While there move the alignment of the .data section ahead of
> >>> the section declaration, like it's done for other sections.
> >>>
> >>> The benefit of having CONFIG_LIVEPATCH enable those compiler option
> >>> is that the livepatch build tools no longer need to fiddle with the
> >>> build system in order to enable them. Note the current livepatch tools
> >>> are broken after the recent build changes due to the way they
> >>> attempt to set  -f{function,data}-sections.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Reviewed-by: Jan Beulich 
> >>
> >>> --- a/xen/arch/x86/xen.lds.S
> >>> +++ b/xen/arch/x86/xen.lds.S
> >>> @@ -88,6 +88,9 @@ SECTIONS
> >>> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
> >>>  
> >>> *(.text)
> >>> +#ifdef CONFIG_CC_SPLIT_SECTIONS
> >>> +   *(.text.*)
> >>> +#endif
> >>> *(.text.__x86_indirect_thunk_*)
> >>> *(.text.page_aligned)
> >>
> >> These last two now will not have any effect anymore when
> >> CC_SPLIT_SECTIONS=y. This may have undesirable effects on the
> >> overall size when there is more than one object with a
> >> .text.page_aligned contribution. In .data ...
> > 
> > Agreed. I wondered whether to move those ahead of the main text
> > section, so likely:
> > 
> >*(.text.unlikely .text.*_unlikely .text.unlikely.*)
> > 
> >*(.text.page_aligned)
> >*(.text.__x86_indirect_thunk_*)
> >*(.text)
> > #ifdef CONFIG_CC_SPLIT_SECTIONS
> >*(.text.*)
> > #endif
> 
> Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*,
> though. When adding .text.* that one can likely go away.
> 
> > FWIW, Linux seems fine to package .text.page_aligned together with the
> > rest of .text using the .text.[0-9a-zA-Z_]* catch-all.
> 
> There's no question this is functionally fine. The question is how
> many extra padding areas are inserted because of this.
> 
> >>> @@ -292,9 +295,7 @@ SECTIONS
> >>>  
> >>>DECL_SECTION(.data) {
> >>> *(.data.page_aligned)
> >>> -   *(.data)
> >>> -   *(.data.rel)
> >>> -   *(.data.rel.*)
> >>> +   *(.data .data.*)
> >>>} PHDR(text)
> >>
> >> ... this continues to be named first. I wonder whether we wouldn't
> >> want to use SORT_BY_ALIGNMENT (if available) instead in both places.
> > 
> > We could use the command line option if available
> > (--sort-section=alignment) to sort all wildcard sections?
> 
> Depends on the scope of the sorting that would result when enabled
> globally like this.

I'm not sure I'm following. Don't we generally want to sort by
alignment in order to avoid adding unnecessary padding as much as
possible?

For any wildcard sections we really don't care anymore how they are
sorted?

Thanks, Roger.

Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active

2022-03-08 Thread Jan Beulich

On 08.03.2022 17:22, Andrew Cooper wrote:
> On 08/03/2022 08:15, Jan Beulich wrote:
>> On 07.03.2022 21:53, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/machine_kexec.c
>>> +++ b/xen/arch/x86/machine_kexec.c
>>> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image)
>>>   */
>>>  local_irq_disable();
>>>  
>>> +/* Reset CPUID masking and faulting to the host's default. */
>>> +ctxt_switch_levelling(NULL);
>>> +
>>> +/* Disable CET. */
>>> +if ( read_cr4() & X86_CR4_CET )
>>> +{
>>> +wrmsrl(MSR_S_CET, 0);
>>> +write_cr4(read_cr4() & ~X86_CR4_CET);
>>> +}
>>> +
>>>  /* Now regular interrupts are disabled, we need to reduce the impact
>>>   * of interrupts not disabled by 'cli'.
>>>   *
>> Besides introducing somewhat of a disconnect between the comment in
>> context here and the earlier local_irq_disable(), is it really
>> necessary to do both actions with IRQs off?
> 
> We are a handful of instructions away from discarding Xen's context
> entirely.  IRQs are not a relevant concern.

Well, as said - the comment was what caught my eye. But as you appear to
think that slight disconnect is not an issue: I don't mean my remark to
be an objection. Feel free to commit with David's R-b.

Jan

Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Jan Beulich

On 08.03.2022 17:26, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote:
>> On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote:
>>> On 08.03.2022 16:36, Bjoern Doebel wrote:
 --- a/xen/arch/x86/livepatch.c
 +++ b/xen/arch/x86/livepatch.c
 @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf 
 *elf,
  
  val -= (uint64_t)dest;
  *(int32_t *)dest = val;
>>>
>>> Afaict after this assignment ...
>>>
 -if ( (int64_t)val != *(int32_t *)dest )
 +if ( (int32_t)val != *(int32_t *)dest )
>>>
>>> ... this condition can never be false. The cast really wants to be
>>> to int64_t, and the overflow you saw being reported is quite likely
>>> for a different reason. But from the sole message you did quote
>>> it's not really possible to figure what else is wrong.
>>
>> It seems Linux has that check ifdef'ed [0], but there's no reference
>> as to why it's that way (I've tracked it back to the x86-64 import on
>> the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076).
>>
>> It's a 64bit relocation using a 32bit value, but it's unclear to me
>> that modifying the top 32bits is not allowed/intended.
> 
> Urg, I've worded this very badly. It's a 64bit relocation using a
> 32bit value, but it's unclear to me that modifying the top 32bits is
> not allowed/intended and fine to be dropped.

I'm still confused: Afaics this is in the handling of R_X86_64_PC32,
which is a 32-bit relocation. Only a 32-bit field in memory is to be
modified, and the resulting value needs to fit such that when the
CPU fetches it and sign-extends it to 64 bits, the original value is
re-established. Hence the check, aiui.

Jan

Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations

2022-03-08 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Tuesday, March 8, 2022 4:26 PM
> To: Bjoern Doebel ; Jan Beulich 
> Cc: Michael Kurth ; Martin Pohlack ; 
> Konrad Rzeszutek Wilk ; Ross Lagerwall 
> ; xen-devel@lists.xenproject.org 
> 
> Subject: Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when 
> computing ELF relocations 
>  
> On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote:
> > On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote:
> > > On 08.03.2022 16:36, Bjoern Doebel wrote:
> > > > --- a/xen/arch/x86/livepatch.c
> > > > +++ b/xen/arch/x86/livepatch.c
> > > > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct 
> > > > livepatch_elf *elf,
> > > >  
> > > >  val -= (uint64_t)dest;
> > > >  *(int32_t *)dest = val;
> > > 
> > > Afaict after this assignment ...
> > > 
> > > > -if ( (int64_t)val != *(int32_t *)dest )
> > > > +if ( (int32_t)val != *(int32_t *)dest )
> > > 
> > > ... this condition can never be false. The cast really wants to be
> > > to int64_t, and the overflow you saw being reported is quite likely
> > > for a different reason. But from the sole message you did quote
> > > it's not really possible to figure what else is wrong.
> > 
> > It seems Linux has that check ifdef'ed [0], but there's no reference
> > as to why it's that way (I've tracked it back to the x86-64 import on
> > the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076).
> > 
> > It's a 64bit relocation using a 32bit value, but it's unclear to me
> > that modifying the top 32bits is not allowed/intended.
> 
> Urg, I've worded this very badly. It's a 64bit relocation using a
> 32bit value, but it's unclear to me that modifying the top 32bits is
> not allowed/intended and fine to be dropped.
> 
> Thanks, Roger.

I'm not sure what you mean by that. The value is computed based on the
load address and the address of the target symbol - i.e. it is a
PC-relative relocation, and the code is checking that the computed
relative value hasn't overflowed the 32-bit destination in memory
e.g. in the unlikely case that the live patch is loaded far away in
memory from the hypervisor.

The code looks correct to me. It needs investigation to find out why this
particular patch is causing an issue since the code is unchanged since v7
of the original xSplice patch series.

Ross

Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions

2022-03-08 Thread Doebel, Bjoern




On 08.03.22 17:01, Ross Lagerwall wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.




From: Doebel, Bjoern 
Sent: Tuesday, March 8, 2022 3:41 PM
To: Ross Lagerwall ; xen-devel@lists.xenproject.org 

Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne 
; Andrew Cooper ; Konrad Rzeszutek Wilk 

Subject: Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
functions


On 08.03.22 16:25, Ross Lagerwall wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.




From: Bjoern Doebel 
Sent: Tuesday, March 8, 2022 10:29 AM
To: xen-devel@lists.xenproject.org 
Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne 
; Andrew Cooper ; Bjoern Doebel ; 
Konrad Rzeszutek Wilk ; Ross Lagerwall 
Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced 
functions

Xen enabled CET for supporting architectures. The control flow aspect of
CET expects functions that can be called indirectly (i.e., via function
pointers) to start with an ENDBR64 instruction. Otherwise a control flow
exception is raised.

This expectation breaks livepatching flows because we patch functions by
overwriting their first 5 bytes with a JMP + , thus breaking the
ENDBR64. We fix this by checking the start of a patched function for
being ENDBR64. In the positive case we move the livepatch JMP to start
behind the ENDBR64 instruction.

To avoid having to guess the ENDBR64 offset again on patch reversal
(which might race with other mechanisms adding/removing ENDBR
dynamically), use the livepatch metadata to store the computed offset
along with the saved bytes of the overwritten function.

Signed-off-by: Bjoern Doebel 
CC: Konrad Rzeszutek Wilk 
CC: Ross Lagerwall 

Note that on top of livepatching functions, Xen supports an additional
mode where we can "remove" a function by overwriting it with NOPs. This
is only supported for functions up to 31 bytes in size and this patch
reduces this limit to 30 bytes.

Changes since r1:
* use sizeof_field() to avoid unused variable warning
* make metadata variable const in arch_livepatch_revert
---
   xen/arch/x86/livepatch.c | 61 ++--
   1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 65530c1e57..0fd97f2a00 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -14,11 +14,29 @@
   #include 
   #include 

+#include 
   #include 
   #include 
   #include 
   #include 

+/*
+ * CET hotpatching support: We may have functions starting with an ENDBR64
+ * instruction that MUST remain the first instruction of the function, hence
+ * we need to move any hotpatch trampoline further into the function. For that
+ * we need to keep track of the patching offset used for any loaded hotpatch
+ * (to avoid racing against other fixups adding/removing ENDBR64 or similar
+ * instructions).
+ *
+ * We do so by making use of the existing opaque metadata area. We use its
+ * first 4 bytes to track the offset into the function used for patching and
+ * the remainder of the data to store overwritten code bytes.
+ */
+struct x86_livepatch_meta {
+uint8_t patch_offset;
+uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)];
+};
+


I think it would make things a bit simpler to use one of the spare _pad bits
from struct livepatch_func  ather than hacking it into the opaque area. Is
there a reason you chose this approach?


No specific reason. Are you suggesting updating the public livepatch
interface to add a new member and reduce the padding size by 1 byte? I
guess that will also require a patch to livepatch-build-tools?

Bjoern


struct livepatch_func contains info that the build tool needs to
communicate to Xen as well as space for Xen's internal book keeping
while the live patch is applied. This includes the array for storing
instructions, the applied flag, and now additionally the patch offset.
(It's a somewhat odd arrangement but it's what we've got...)

The build tool does not need to know the details about any of Xen's internal
book keeping. So my preference would be to have patch_offset as an additional
member next to applied (reducing padding by 1) and then in livepatch-build-tools
replace:

 unsigned char pad[MAX_REPLACEMENT_SIZE];
 uint8_t applied;
 uint8_t _pad[7];

with simply:

 uint8_t opaque[39];


What do you think?


That will simplify this patch - I like it. Will send update + 
livepatch-build patch.


Bjoern



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH v3 1/2] xen/build: put image header into a separate section

2022-03-08 Thread Jan Beulich

On 08.03.2022 17:36, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 04:08:53PM +0100, Jan Beulich wrote:
>> On 08.03.2022 15:18, Roger Pau Monné wrote:
>>> On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote:
 On 08.03.2022 14:49, Roger Pau Monne wrote:
> So it can be explicitly placed ahead of the rest of the .text content
> in the linker script (and thus the resulting image). This is a
> prerequisite for further work that will add a catch-all to the text
> section (.text.*).
>
> Note that placement of the sections inside of .text is also slightly
> adjusted to be more similar to the position found in the default GNU
> ld linker script.
>
> The special handling of the object file containing the header data as
> the first object file passed to the linker command line can also be
> removed.
>
> While there also remove the special handling of efi/ on x86. There's
> no need for the resulting object file to be passed in any special
> order to the linker.
>
> Signed-off-by: Roger Pau Monné 

 Looks good to me, but I have one question before feeling ready to
 offer R-b:

> @@ -86,8 +84,13 @@ SECTIONS
> *(.text.kexec)  /* Page aligned in the object file. */
> kexec_reloc_end = .;
>  
> -   *(.text.cold)
> -   *(.text.unlikely)
> +   *(.text.cold .text.cold.*)
> +   *(.text.unlikely .text.*_unlikely .text.unlikely.*)

 What generates .text.*_unlikely? And if anything really does, why
 would .text.cold not have a similar equivalent?
>>>
>>> That matches what I saw in the default linker script from my version
>>> of GNU ld:
>>>
>>> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
>>>
>>> I really don't know what could generate .text.*_unlikely, but since
>>> it's part of the default linker script I assumed it was better to just
>>> add it.
>>
>> I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from
>> 4.6.x. onwards it would be .text.unlikely.*.
>>
>> As to the dissimilarity with .text.cold: I wonder why we have that in
>> the first place. It matches our __cold attribute, just that we don't
>> use that anywhere afaics.
>>
>> In any event:
>> Reviewed-by: Jan Beulich 
>> albeit preferably with .text.cold.* dropped again.
> 
> Would you mind dropping the .text.cold.* at commit? Otherwise I can
> resend.

Sure; no need to resend just for this.

Jan

Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option

2022-03-08 Thread Jan Beulich

On 08.03.2022 17:41, Roger Pau Monné wrote:
> On Tue, Mar 08, 2022 at 04:13:55PM +0100, Jan Beulich wrote:
>> On 08.03.2022 15:46, Roger Pau Monné wrote:
>>> On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote:
 On 08.03.2022 14:49, Roger Pau Monne wrote:
> If livepatching support is enabled build the hypervisor with
> -f{function,data}-sections compiler options, which is required by the
> livepatching tools to detect changes and create livepatches.
>
> This shouldn't result in any functional change on the hypervisor
> binary image, but does however require some changes in the linker
> script in order to handle that each function and data item will now be
> placed into its own section in object files. As a result add catch-all
> for .text, .data and .bss in order to merge each individual item
> section into the final image.
>
> The main difference will be that .text.startup will end up being part
> of .text rather than .init, and thus won't be freed. .text.exit will
> also be part of .text rather than dropped. Overall this could make the
> image bigger, and package some .text code in a sub-optimal way.
>
> On Arm the .data.read_mostly needs to be moved ahead of the .data
> section like it's already done on x86, so the .data.* catch-all
> doesn't also include .data.read_mostly. The alignment of
> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
> up being placed at the tail of a read-only page from the previous
> section. While there move the alignment of the .data section ahead of
> the section declaration, like it's done for other sections.
>
> The benefit of having CONFIG_LIVEPATCH enable those compiler option
> is that the livepatch build tools no longer need to fiddle with the
> build system in order to enable them. Note the current livepatch tools
> are broken after the recent build changes due to the way they
> attempt to set  -f{function,data}-sections.
>
> Signed-off-by: Roger Pau Monné 

 Reviewed-by: Jan Beulich 

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -88,6 +88,9 @@ SECTIONS
> *(.text.unlikely .text.*_unlikely .text.unlikely.*)
>  
> *(.text)
> +#ifdef CONFIG_CC_SPLIT_SECTIONS
> +   *(.text.*)
> +#endif
> *(.text.__x86_indirect_thunk_*)
> *(.text.page_aligned)

 These last two now will not have any effect anymore when
 CC_SPLIT_SECTIONS=y. This may have undesirable effects on the
 overall size when there is more than one object with a
 .text.page_aligned contribution. In .data ...
>>>
>>> Agreed. I wondered whether to move those ahead of the main text
>>> section, so likely:
>>>
>>>*(.text.unlikely .text.*_unlikely .text.unlikely.*)
>>>
>>>*(.text.page_aligned)
>>>*(.text.__x86_indirect_thunk_*)
>>>*(.text)
>>> #ifdef CONFIG_CC_SPLIT_SECTIONS
>>>*(.text.*)
>>> #endif
>>
>> Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*,
>> though. When adding .text.* that one can likely go away.
>>
>>> FWIW, Linux seems fine to package .text.page_aligned together with the
>>> rest of .text using the .text.[0-9a-zA-Z_]* catch-all.
>>
>> There's no question this is functionally fine. The question is how
>> many extra padding areas are inserted because of this.
>>
> @@ -292,9 +295,7 @@ SECTIONS
>  
>DECL_SECTION(.data) {
> *(.data.page_aligned)
> -   *(.data)
> -   *(.data.rel)
> -   *(.data.rel.*)
> +   *(.data .data.*)
>} PHDR(text)

 ... this continues to be named first. I wonder whether we wouldn't
 want to use SORT_BY_ALIGNMENT (if available) instead in both places.
>>>
>>> We could use the command line option if available
>>> (--sort-section=alignment) to sort all wildcard sections?
>>
>> Depends on the scope of the sorting that would result when enabled
>> globally like this.
> 
> I'm not sure I'm following. Don't we generally want to sort by
> alignment in order to avoid adding unnecessary padding as much as
> possible?
> 
> For any wildcard sections we really don't care anymore how they are
> sorted?

Sure. Question is whether sorting is limited to within any single
*(...) construct, or whether it could extend to adjacent ones. IOW
whether the command line option strictly is a replacement of adding
SORT_BY_ALIGNMENT to every one of these constructs.

Jan

[PATCH v6 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-03-08 Thread Jane Malalane

Add XEN_SYSCTL_PHYSCAP_X86_ASSISTED_XAPIC and
XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X2APIC to report accelerated xapic
and x2apic, on x86 hardware.
No such features are currently implemented on AMD hardware.

HW assisted xAPIC virtualization will be reported if HW, at the
minimum, supports virtualize_apic_accesses as this feature alone means
that an access to the APIC page will cause an APIC-access VM exit. An
APIC-access VM exit provides a VMM with information about the access
causing the VM exit, unlike a regular EPT fault, thus simplifying some
internal handling.

HW assisted x2APIC virtualization will be reported if HW supports
virtualize_x2apic_mode and, at least, either apic_reg_virt or
virtual_intr_delivery. This also means that
sysctl follows the conditionals in vmx_vlapic_msr_changed().

For that purpose, also add an arch-specific "capabilities" parameter
to struct xen_sysctl_physinfo.

Note that this interface is intended to be compatible with AMD so that
AVIC support can be introduced in a future patch. Unlike Intel that
has multiple controls for APIC Virtualization, AMD has one global
'AVIC Enable' control bit, so fine-graining of APIC virtualization
control cannot be done on a common interface.

Suggested-by: Andrew Cooper 
Signed-off-by: Jane Malalane 
---
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Andrew Cooper 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: "Roger Pau Monné" 

v6:
 * Limit abi check to x86
 * Fix coding style issue

v5:
 * Have assisted_xapic_available solely depend on
   cpu_has_vmx_virtualize_apic_accesses and assisted_x2apic_available
   depend on cpu_has_vmx_virtualize_x2apic_mode and
   cpu_has_vmx_apic_reg_virt OR cpu_has_vmx_virtual_intr_delivery

v4:
 * Fallback to the original v2/v1 conditions for setting
   assisted_xapic_available and assisted_x2apic_available so that in
   the future APIC virtualization can be exposed on AMD hardware
   since fine-graining of "AVIC" is not supported, i.e., AMD solely
   uses "AVIC Enable". This also means that sysctl mimics what's
   exposed in CPUID

v3:
 * Define XEN_SYSCTL_PHYSCAP_ARCH_MAX for ABI checking and actually
   set "arch_capbilities", via a call to c_bitmap_to_ocaml_list()
 * Have assisted_x2apic_available only depend on
   cpu_has_vmx_virtualize_x2apic_mode

v2:
 * Use one macro LIBXL_HAVE_PHYSINFO_ASSISTED_APIC instead of two
 * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
 * Set assisted_x{2}apic_available to be conditional upon "bsp" and
   annotate it with __ro_after_init
 * Change XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_X{2}APIC to
   _X86_ASSISTED_X{2}APIC
 * Keep XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2}APIC contained within
   sysctl.h
 * Fix padding introduced in struct xen_sysctl_physinfo and bump
   XEN_SYSCTL_INTERFACE_VERSION
---
 tools/golang/xenlight/helpers.gen.go |  4 
 tools/golang/xenlight/types.gen.go   |  2 ++
 tools/include/libxl.h|  7 +++
 tools/libs/light/libxl.c |  3 +++
 tools/libs/light/libxl_arch.h|  4 
 tools/libs/light/libxl_arm.c |  5 +
 tools/libs/light/libxl_types.idl |  2 ++
 tools/libs/light/libxl_x86.c | 11 +++
 tools/ocaml/libs/xc/xenctrl.ml   |  5 +
 tools/ocaml/libs/xc/xenctrl.mli  |  5 +
 tools/ocaml/libs/xc/xenctrl_stubs.c  | 15 +--
 tools/xl/xl_info.c   |  6 --
 xen/arch/x86/hvm/vmx/vmcs.c  |  9 +
 xen/arch/x86/include/asm/domain.h|  3 +++
 xen/arch/x86/sysctl.c|  7 +++
 xen/include/public/sysctl.h  | 11 ++-
 16 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b746ff1081..dd4e6c9f14 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -3373,6 +3373,8 @@ x.CapVmtrace = bool(xc.cap_vmtrace)
 x.CapVpmu = bool(xc.cap_vpmu)
 x.CapGnttabV1 = bool(xc.cap_gnttab_v1)
 x.CapGnttabV2 = bool(xc.cap_gnttab_v2)
+x.CapAssistedXapic = bool(xc.cap_assisted_xapic)
+x.CapAssistedX2Apic = bool(xc.cap_assisted_x2apic)
 
  return nil}
 
@@ -3407,6 +3409,8 @@ xc.cap_vmtrace = C.bool(x.CapVmtrace)
 xc.cap_vpmu = C.bool(x.CapVpmu)
 xc.cap_gnttab_v1 = C.bool(x.CapGnttabV1)
 xc.cap_gnttab_v2 = C.bool(x.CapGnttabV2)
+xc.cap_assisted_xapic = C.bool(x.CapAssistedXapic)
+xc.cap_assisted_x2apic = C.bool(x.CapAssistedX2Apic)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index b1e84d5258..87be46c745 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -1014,6 +1014,8 @@ CapVmtrace bool
 CapVpmu bool
 CapGnttabV1 bool
 CapGnttabV2 bool
+CapAssistedXapic bool
+CapAssistedX2Apic bool
 }
 
 type Connectorinfo struct {
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
in

[PATCH v6 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-03-08 Thread Jane Malalane

Introduce a new per-domain creation x86 specific flag to
select whether hardware assisted virtualization should be used for
x{2}APIC.

A per-domain option is added to xl in order to select the usage of
x{2}APIC hardware assisted virtualization, as well as a global
configuration option.

Having all APIC interaction exit to Xen for emulation is slow and can
induce much overhead. Hardware can speed up x{2}APIC by decoding the
APIC access and providing a VM exit with a more specific exit reason
than a regular EPT fault or by altogether avoiding a VM exit.

On the other hand, being able to disable x{2}APIC hardware assisted
virtualization can be useful for testing and debugging purposes.

Note: vmx_install_vlapic_mapping doesn't require modifications
regardless of whether the guest has "Virtualize APIC accesses" enabled
or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long
as virtualize_apic_accesses is supported by the CPU.

Suggested-by: Andrew Cooper 
Signed-off-by: Jane Malalane 
---
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Andrew Cooper 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Christian Lindig 
CC: David Scott 
CC: Volodymyr Babchuk 
CC: "Roger Pau Monné" 

v6:
 * Use ENODEV instead of EINVAL when rejecting assisted_x{2}apic
   for PV guests
 * Move has_assisted_x{2}apic macros out of an Intel specific header
 * Remove references to Intel specific features in documentation

v5:
 * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of
   the has_assisted_x{2}apic macros
 * Following changes in assisted_x{2}apic_available definitions in
   patch 1, retighten conditionals for setting
   XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in
   cpuid_hypervisor_leaves()

v4:
 * Add has_assisted_x{2}apic macros and use them where appropriate
 * Replace CPU checks with per-domain assisted_x{2}apic control
   options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(),
   following edits to assisted_x{2}apic_available definitions in
   patch 1
   Note: new assisted_x{2}apic_available definitions make later
   cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery
   checks redundant in vmx_vlapic_msr_changed()

v3:
 * Change info in xl.cfg to better express reality and fix
   capitalization of x{2}apic
 * Move "physinfo" variable definition to the beggining of
   libxl__domain_build_info_setdefault()
 * Reposition brackets in if statement to match libxl coding style
 * Shorten logic in libxl__arch_domain_build_info_setdefault()
 * Correct dprintk message in arch_sanitise_domain_config()
 * Make appropriate changes in vmx_vlapic_msr_changed() and
   cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit
 * Remove unneeded parantheses

v2:
 * Add a LIBXL_HAVE_ASSISTED_APIC macro
 * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
 * Add a return statement in now "int"
   libxl__arch_domain_build_info_setdefault()
 * Preserve libxl__arch_domain_build_info_setdefault 's location in
   libxl_create.c
 * Correct x{2}apic default setting logic in
   libxl__arch_domain_prepare_config()
 * Correct logic for parsing assisted_x{2}apic host/guest options in
   xl_parse.c and initialize them to -1 in xl.c
 * Use guest options directly in vmx_vlapic_msr_changed
 * Fix indentation of bool assisted_x{2}apic in struct hvm_domain
 * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks
---
 docs/man/xl.cfg.5.pod.in  | 15 +++
 docs/man/xl.conf.5.pod.in | 12 
 tools/golang/xenlight/helpers.gen.go  | 12 
 tools/golang/xenlight/types.gen.go|  2 ++
 tools/include/libxl.h |  7 +++
 tools/libs/light/libxl_arch.h |  5 +++--
 tools/libs/light/libxl_arm.c  |  7 +--
 tools/libs/light/libxl_create.c   | 22 +-
 tools/libs/light/libxl_types.idl  |  2 ++
 tools/libs/light/libxl_x86.c  | 27 +--
 tools/ocaml/libs/xc/xenctrl.ml|  2 ++
 tools/ocaml/libs/xc/xenctrl.mli   |  2 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c   |  2 +-
 tools/xl/xl.c |  8 
 tools/xl/xl.h |  2 ++
 tools/xl/xl_parse.c   | 16 
 xen/arch/x86/domain.c | 28 +++-
 xen/arch/x86/hvm/vmx/vmcs.c   |  4 
 xen/arch/x86/hvm/vmx/vmx.c| 13 -
 xen/arch/x86/include/asm/hvm/domain.h |  6 ++
 xen/arch/x86/include/asm/hvm/hvm.h|  6 ++
 xen/arch/x86/traps.c  |  5 +++--
 xen/include/public/arch-x86/xen.h |  2 ++
 23 files changed, 179 insertions(+), 28 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index b98d161398..b4239fcc5e 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1862,6 +1862,21 @@ firmware tables when using certain older guest Operating
 Sy

Re: [XEN v9 3/4] xen/arm64: io: Handle the abort due to access to stage1 translation table

2022-03-08 Thread Julien Grall


Hi,

On 08/03/2022 11:22, Ayan Kumar Halder wrote:

Hi Julien,

On 07/03/2022 23:59, Julien Grall wrote:

Hi,

On 07/03/2022 22:23, Ayan Kumar Halder wrote:


On 07/03/2022 19:37, Julien Grall wrote:



On 07/03/2022 14:27, Ayan Kumar Halder wrote:

Hi Julien,


Hi Ayan,


Hi Julien,

I need a bit of clarification to understand this.





One clarification.

On 04/03/2022 10:39, Julien Grall wrote:

Hi Ayan,

On 01/03/2022 12:40, Ayan Kumar Halder wrote:
If the abort was caused due to access to stage1 translation 
table, Xen
will assume that the stage1 translation table is in the non MMIO 
region.


Reading this commit message again. I think you want to explain why 
we want to do that because, from my understanding, this is 
technically not forbidden by the Arm Arm.


From the previous discussion, we want to do this because we can't 
easily handle such fault on emulated region (we have no away to the 
walker the value read).


Sorry, Can you explain this a bit more ? Do you mean that if the page 
table is located in the emulated region, map_domain_page() (called 
from p2m_next_level()) will fail.


For data abort with valid syndrome, you will have a register to write 
back the value read. When the data abort has s1ptw == 1, AFAICT, we 
have no information how to return the value.


Do you mean that for s1ptw, we get an intermediate physical address ?

     if ( hpfar_is_valid(xabt.s1ptw, fsc) )
     gpa = get_faulting_ipa(gva);


No. That's not relevant here. We always need a IPA in order to deal with 
the fault.




If the IPA corresponds to an emulated region, then Xen can read the 
emulated address, but can't return the value to the guest OS.

That wouldn't be the guest OS. But the page-table walker in the CPU.

Can Linux or any OS keep its page tables in the direct MMIO space ? 
If yes, then try_map_mmio() needs to be invoked to map the region, so 
that OS can access it. If not, then Xen needs to return abort because 
the OS may be behaving maliciously.


I think what matters is whether the Arm Arm would or would not allow 
it. From what I can tell there are no such restriction. So we would 
need to be cautious to restrict it further than necessary.




My understanding from previous discussion was that it does not make 
sense for linux or any OS to keep its page tables in any MMIO region 
(emulated or direct). Please correct me if mistaken.


At the moment, none of the regions emulated by Xen could be used for 
page-tables. I am also not sure how we should handle such access if 
they arise. So it is more convenient to simply forbid them.


Regarding direct MMIO, see above. Correct me if I am wrong, but it 
should not be a problem for Xen to deal with them. So while I agree 
this doesn't seem to make sense, the restriction seems unnecessary.


So the behavior will be :-

1. If the stage1 translation table is in the non MMIO region or 'direct 
mapped' MMIO region, then invoke p2m_resolve_translation_fault() and 
try_map_mmio() to resolve the fault. If it succeeds, then return to the 
guest to retry.


When 1. happens we don't know yet if the stage1 will be a non MMIO 
region or 'direct mapped'. Instead, we only know that the ISS syndrome 
is invalid.


If it is a prefetch abort or the syndrome is invalid, then we should 
call check_p2m().




2. If the previous step fails and for any other scenario (ie stage1 
translation table is in emulated MMIO region or the address is invalid), 
return the abort to the guest.


If check_p2m() didn't success and it is a fault on stage-1 walk, then we 
should send an abort.


Cheers,

--
Julien Grall

Re: [PATCH v4] vpci/msix: fix PBA accesses

2022-03-08 Thread Alex Olson

On Tue, 2022-03-08 at 09:31 +0100, Jan Beulich wrote:
> On 07.03.2022 17:37, Roger Pau Monne wrote:
> > Map the PBA in order to access it from the MSI-X read and write
> > handlers. Note that previously the handlers would pass the physical
> > host address into the {read,write}{l,q} handlers, which is wrong as
> > those expect a linear address.
> > 
> > Map the PBA using ioremap when the first access is performed. Note
> > that 32bit arches might want to abstract the call to ioremap into a
> > vPCI arch handler, so they can use a fixmap range to map the PBA.
> > 
> > Reported-by: Jan Beulich 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> > Cc: Alex Olson 
> 
> I'll wait a little with committing, in the hope for Alex to re-provide
> a Tested-by.

It works fine for me, you can add "Tested-by: alex.ol...@starlab.io" to the
commit.

> 
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct
> > vpci_msix *msix,
> >  return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
> >  }
> >  
> > +static void __iomem *get_pba(struct vpci *vpci)
> > +{
> > +struct vpci_msix *msix = vpci->msix;
> > +/*
> > + * PBA will only be unmapped when the device is deassigned, so access
> > it
> > + * without holding the vpci lock.
> > + */
> > +void __iomem *pba = read_atomic(&msix->pba);
> > +
> > +if ( likely(pba) )
> > +return pba;
> > +
> > +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
> > +  vmsix_table_size(vpci, VPCI_MSIX_PBA));
> > +if ( !pba )
> > +return read_atomic(&msix->pba);
> > +
> > +spin_lock(&vpci->lock);
> > +if ( !msix->pba )
> > +{
> > +write_atomic(&msix->pba, pba);
> > +spin_unlock(&vpci->lock);
> > +}
> > +else
> > +{
> > +spin_unlock(&vpci->lock);
> > +iounmap(pba);
> > +}
> 
> TBH I had been hoping for just a single spin_unlock(), but you're
> the maintainer of this code ...
> 
> Jan
> 
Thanks

-Alex

Re: [PATCH 1/3] xen/arm: Add i.MX lpuart driver

2022-03-08 Thread Julien Grall





On 28/02/2022 09:27, Peng Fan wrote:

Hi Julien,


Hi Peng,


Subject: Re: [PATCH 1/3] xen/arm: Add i.MX lpuart driver

Hi Peng,

On 28/02/2022 01:07, Peng Fan (OSS) wrote:

From: Peng Fan 


Can you give me a link to the specification and/or a similar driver in Linux?


https://www.nxp.com/webapp/Download?colCode=IMX8QMIEC
Chatper 13.6 Low Power Universal Asynchronous Receiver/
Transmitter (LPUART)
But this requires registration to access.
Ok. I think it would still be valuable to add the link of to the spec in 
the commit message.




Linux driver:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tty/serial/fsl_lpuart.c


I would also add a link to the Linux code just as a reference (if one 
doesn't want to register).


Cheers,

--
Julien Grall

1 2 >

1 - 100 of 124 matches

Mail list logo