Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option
On Mon, Mar 07, 2022 at 06:07:00PM +0100, Jan Beulich wrote: > On 07.03.2022 17:36, Roger Pau Monné wrote: > > On Mon, Mar 07, 2022 at 05:28:20PM +0100, Jan Beulich wrote: > >> On 07.03.2022 16:55, Roger Pau Monne wrote: > >>> If livepatching support is enabled build the hypervisor with > >>> -f{function,data}-sections compiler options, which is required by the > >>> livepatching tools to detect changes and create livepatches. > >>> > >>> This shouldn't result in any functional change on the hypervisor > >>> binary image, but does however require some changes in the linker > >>> script in order to handle that each function and data item will now be > >>> placed into its own section in object files. As a result add catch-all > >>> for .text, .data and .bss in order to merge each individual item > >>> section into the final image. > >>> > >>> The main difference will be that .text.startup will end up being part > >>> of .text rather than .init, and thus won't be freed. .text.exit will > >>> also be part of .text rather than dropped. Overall this could make the > >>> image bigger, and package some .text code in a sub-optimal way. > >>> > >>> Note that placement of the sections inside of .text is also slightly > >>> adjusted to be more similar to the position found in the default GNU > >>> ld linker script. This requires having a separate section for the > >>> header in order to place it at the begging of the output image, > >>> followed with the unlikely and related sections, and finally the main > >>> .text section. > >>> > >>> On Arm the .data.read_mostly needs to be moved ahead of the .data > >>> section like it's already done on x86, and the alignment needs to be > >>> set to PAGE_SIZE so it doesn't end up being placed at the tail of a > >>> read-only page from the previous section. While there move the > >>> alignment of the .data section ahead of the section declaration, like > >>> it's done for other sections. > >>> > >>> The benefit of having CONFIG_LIVEPATCH enable those compiler option > >>> is that the livepatch build tools no longer need to fiddle with the > >>> build system in order to enable them. Note the current livepatch tools > >>> are broken after the recent build changes due to the way they > >>> attempt to set -f{function,data}-sections. > >>> > >>> Signed-off-by: Roger Pau Monné > >> > >> The x86 part of this looks fine to me, just one other remark: > >> > >>> --- a/xen/common/Kconfig > >>> +++ b/xen/common/Kconfig > >>> @@ -350,10 +350,14 @@ source "common/sched/Kconfig" > >>> config CRYPTO > >>> bool > >>> > >>> +config CC_SPLIT_SECTIONS > >>> + bool > >> > >> I think this wants to live higher up in the file, perhaps between > >> ALTERNATIVE_CALL and HAS_ALTERNATIVE. (CRYPTO, as seen in context > >> here, imo also would better live higher up.) Or alternatively it may > >> want to move to xen/Kconfig, next to CC_HAS_VISIBILITY_ATTRIBUTE. > > > > I was tempted to place it in xen/Kconfig. The logic for the current > > suggested placement is to be closer to it's current only user > > (LIVEPATCH), but I'm not opposed to moving it somewhere else if > > there's consensus. > > I guess the main question is whether this option is intended to gain > a prompt. If so, xen/common/Kconfig is likely the better place. If > not, I think it better fits in xen/Kconfig. I think it's unlikely for it to gain a prompt, other options selecting it is what I would expect. Will move to xen/Kconfig unless someone objects. Thanks, Roger.
Re: [PATCH v2 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 07.03.2022 22:13, Bjoern Doebel wrote: > @@ -159,7 +200,11 @@ void noinline arch_livepatch_apply(struct livepatch_func > *func) > */ > void noinline arch_livepatch_revert(const struct livepatch_func *func) > { > -memcpy(func->old_addr, func->opaque, livepatch_insn_len(func)); > +struct x86_livepatch_meta *lp; Repeating my comment here a 3rd time (sorry): const please (and generally wherever possible). Jan > +lp = (struct x86_livepatch_meta *)func->opaque; > + > +memcpy(func->old_addr + lp->patch_offset, lp->instruction, > livepatch_insn_len(func)); > } > > /*
Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option
On Mon, Mar 07, 2022 at 05:19:53PM +, Julien Grall wrote: > Hi Roger, > > On 07/03/2022 15:55, Roger Pau Monne wrote: > > If livepatching support is enabled build the hypervisor with > > -f{function,data}-sections compiler options, which is required by the > > livepatching tools to detect changes and create livepatches. > > > > This shouldn't result in any functional change on the hypervisor > > binary image, but does however require some changes in the linker > > script in order to handle that each function and data item will now be > > placed into its own section in object files. As a result add catch-all > > for .text, .data and .bss in order to merge each individual item > > section into the final image. > > > > The main difference will be that .text.startup will end up being part > > of .text rather than .init, and thus won't be freed. .text.exit will > > also be part of .text rather than dropped. Overall this could make the > > image bigger, and package some .text code in a sub-optimal way. > > > > Note that placement of the sections inside of .text is also slightly > > adjusted to be more similar to the position found in the default GNU > > ld linker script. This requires having a separate section for the > > header in order to place it at the begging of the output image, > > followed with the unlikely and related sections, and finally the main > > .text section. > > > > On Arm the .data.read_mostly needs to be moved ahead of the .data > > section like it's already done on x86, and the alignment needs to be > > set to PAGE_SIZE so it doesn't end up being placed at the tail of a > > read-only page from the previous section. While there move the > > alignment of the .data section ahead of the section declaration, like > > it's done for other sections. > > This sounds like a bug not related to this patch. Can this be split > separately? No, .data.read_mostly needs to be moved before .data so that the catch all .data.* introduced to .data to account for -fdata-sections doesn't end up also including .data.read_mostly. > > > > The benefit of having CONFIG_LIVEPATCH enable those compiler option > > is that the livepatch build tools no longer need to fiddle with the > > build system in order to enable them. Note the current livepatch tools > > are broken after the recent build changes due to the way they > > attempt to set -f{function,data}-sections. > > > > Signed-off-by: Roger Pau Monné > > --- > > Changes since v1: > > - Introduce CC_SPLIT_SECTIONS for selecting the compiler options. > > - Drop check for compiler options, all supported versions have them. > > - Re-arrange section placement in .text, to match the default linker > > script. > > - Introduce .text.header to contain the headers bits that must appear > > first in the final binary. > > --- > > Given that now the header is explicitly placed in .text.header, it's > > likely possible to simplify some of the ordering of the object files > > for the prelink.o generation, as arch/x86/boot/built_in.o no longer > > needs to be the first object file in the list. > > > > It also seems on Arm the schedulers and hypfs .data sections should be > > moved into read_mostly. > > --- > > Tested by gitlab in order to assert I didn't introduce any regression > > on Arm specially. > > --- > > xen/Makefile | 2 ++ > > xen/arch/arm/arm32/head.S | 1 + > > xen/arch/arm/arm64/head.S | 1 + > > xen/arch/arm/xen.lds.S| 49 +-- > > xen/arch/x86/boot/head.S | 2 +- > > xen/arch/x86/xen.lds.S| 22 +++--- > > xen/common/Kconfig| 4 > > 7 files changed, 49 insertions(+), 32 deletions(-) > > > > diff --git a/xen/Makefile b/xen/Makefile > > index 5c21492d6f..18a4f7e101 100644 > > --- a/xen/Makefile > > +++ b/xen/Makefile > > @@ -273,6 +273,8 @@ else > > CFLAGS += -fomit-frame-pointer > > endif > > +CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections > > + > > CFLAGS += -nostdinc -fno-builtin -fno-common > > CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith > > $(call cc-option-add,CFLAGS,CC,-Wvla) > > diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S > > index 7a906167ef..c837d3054c 100644 > > --- a/xen/arch/arm/arm32/head.S > > +++ b/xen/arch/arm/arm32/head.S > > @@ -120,6 +120,7 @@ > > #endif /* !CONFIG_EARLY_PRINTK */ > > +.section .text.header, "ax", %progbits > > .arm > > /* > > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > > index 66d862fc81..e62c48ec1c 100644 > > --- a/xen/arch/arm/arm64/head.S > > +++ b/xen/arch/arm/arm64/head.S > > @@ -133,6 +133,7 @@ > > add \xb, \xb, x20 > > .endm > > +.section .text.header, "ax", %progbits > > /*.aarch64*/ > > /* > > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S > > index 08016948ab..836da880c3 100644 > > --- a/xen/arch/arm/xen.lds.S > > +++ b/x
Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active
On 07.03.2022 21:53, Andrew Cooper wrote: > --- a/xen/arch/x86/machine_kexec.c > +++ b/xen/arch/x86/machine_kexec.c > @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) > */ > local_irq_disable(); > > +/* Reset CPUID masking and faulting to the host's default. */ > +ctxt_switch_levelling(NULL); > + > +/* Disable CET. */ > +if ( read_cr4() & X86_CR4_CET ) > +{ > +wrmsrl(MSR_S_CET, 0); > +write_cr4(read_cr4() & ~X86_CR4_CET); > +} > + > /* Now regular interrupts are disabled, we need to reduce the impact > * of interrupts not disabled by 'cli'. > * Besides introducing somewhat of a disconnect between the comment in context here and the earlier local_irq_disable(), is it really necessary to do both actions with IRQs off? Jan
Re: [PATCH v2 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.22 09:07, Jan Beulich wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 07.03.2022 22:13, Bjoern Doebel wrote: @@ -159,7 +200,11 @@ void noinline arch_livepatch_apply(struct livepatch_func *func) */ void noinline arch_livepatch_revert(const struct livepatch_func *func) { -memcpy(func->old_addr, func->opaque, livepatch_insn_len(func)); +struct x86_livepatch_meta *lp; Repeating my comment here a 3rd time (sorry): const please (and generally wherever possible). That's embarrassing... ;) Sorry. Will fix. Bjoern Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH] x86/kexec: Use noreturn attributes, and drop unreachable code
On 07.03.2022 22:02, Andrew Cooper wrote: > kexec_reloc() does not return. Plumbing this property upwards lets us mark > machine_kexec() and machine_reboot_kexec() noreturn too. This in turn lets us > drop some unreachable BUG()/return statements. I'm certainly fine with the added attributes. I'm less convinced of the removal of BUG() - I'd prefer to leave to the compiler elimination of these when found to be dead, just to be on the safe side. And I'm pretty uncertain about the removal of "return", when it comes to old enough compilers. > @@ -403,9 +401,6 @@ static long cf_check kexec_reboot(void *_image) > > kexec_common_shutdown(); > machine_reboot_kexec(image); > - > -BUG(); > -return 0; > } In principle this function now is also "noreturn", but I think I vaguely recall some compilers warning about "noreturn" when a function returns other than void. Jan
Re: [PATCH v4] vpci/msix: fix PBA accesses
On 07.03.2022 17:37, Roger Pau Monne wrote: > Map the PBA in order to access it from the MSI-X read and write > handlers. Note that previously the handlers would pass the physical > host address into the {read,write}{l,q} handlers, which is wrong as > those expect a linear address. > > Map the PBA using ioremap when the first access is performed. Note > that 32bit arches might want to abstract the call to ioremap into a > vPCI arch handler, so they can use a fixmap range to map the PBA. > > Reported-by: Jan Beulich > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich > Cc: Alex Olson I'll wait a little with committing, in the hope for Alex to re-provide a Tested-by. > --- a/xen/drivers/vpci/msix.c > +++ b/xen/drivers/vpci/msix.c > @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct > vpci_msix *msix, > return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE]; > } > > +static void __iomem *get_pba(struct vpci *vpci) > +{ > +struct vpci_msix *msix = vpci->msix; > +/* > + * PBA will only be unmapped when the device is deassigned, so access it > + * without holding the vpci lock. > + */ > +void __iomem *pba = read_atomic(&msix->pba); > + > +if ( likely(pba) ) > +return pba; > + > +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA), > + vmsix_table_size(vpci, VPCI_MSIX_PBA)); > +if ( !pba ) > +return read_atomic(&msix->pba); > + > +spin_lock(&vpci->lock); > +if ( !msix->pba ) > +{ > +write_atomic(&msix->pba, pba); > +spin_unlock(&vpci->lock); > +} > +else > +{ > +spin_unlock(&vpci->lock); > +iounmap(pba); > +} TBH I had been hoping for just a single spin_unlock(), but you're the maintainer of this code ... Jan
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.12.2021 16:55, Juergen Gross wrote: > In order to avoid indirect function calls on the hypercall path as > much as possible this series is removing the hypercall function tables > and is replacing the hypercall handler calls via the function array > by automatically generated call macros. > > Another by-product of generating the call macros is the automatic > generating of the hypercall handler prototypes from the same data base > which is used to generate the macros. > > This has the additional advantage of using type safe calls of the > handlers and to ensure related handler (e.g. PV and HVM ones) share > the same prototypes. > > A very brief performance test (parallel build of the Xen hypervisor > in a 6 vcpu guest) showed a very slim improvement (less than 1%) of > the performance with the patches applied. The test was performed using > a PV and a PVH guest. > > Changes in V2: > - new patches 6, 14, 15 > - patch 7: support hypercall priorities for faster code > - comments addressed > > Changes in V3: > - patches 1 and 4 removed as already applied > - comments addressed > > Juergen Gross (13): > xen: move do_vcpu_op() to arch specific code > xen: harmonize return types of hypercall handlers > xen: don't include asm/hypercall.h from C sources > xen: include compat/platform.h from hypercall.h > xen: generate hypercall interface related code > xen: use generated prototypes for hypercall handlers > x86/pv-shim: don't modify hypercall table > xen/x86: don't use hypercall table for calling compat hypercalls > xen/x86: call hypercall handlers via generated macro > xen/arm: call hypercall handlers via generated macro > xen/x86: add hypercall performance counters for hvm, correct pv > xen: drop calls_to_multicall performance counter > tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. Jan
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.22 09:34, Jan Beulich wrote: On 08.12.2021 16:55, Juergen Gross wrote: In order to avoid indirect function calls on the hypercall path as much as possible this series is removing the hypercall function tables and is replacing the hypercall handler calls via the function array by automatically generated call macros. Another by-product of generating the call macros is the automatic generating of the hypercall handler prototypes from the same data base which is used to generate the macros. This has the additional advantage of using type safe calls of the handlers and to ensure related handler (e.g. PV and HVM ones) share the same prototypes. A very brief performance test (parallel build of the Xen hypervisor in a 6 vcpu guest) showed a very slim improvement (less than 1%) of the performance with the patches applied. The test was performed using a PV and a PVH guest. Changes in V2: - new patches 6, 14, 15 - patch 7: support hypercall priorities for faster code - comments addressed Changes in V3: - patches 1 and 4 removed as already applied - comments addressed Juergen Gross (13): xen: move do_vcpu_op() to arch specific code xen: harmonize return types of hypercall handlers xen: don't include asm/hypercall.h from C sources xen: include compat/platform.h from hypercall.h xen: generate hypercall interface related code xen: use generated prototypes for hypercall handlers x86/pv-shim: don't modify hypercall table xen/x86: don't use hypercall table for calling compat hypercalls xen/x86: call hypercall handlers via generated macro xen/arm: call hypercall handlers via generated macro xen/x86: add hypercall performance counters for hvm, correct pv xen: drop calls_to_multicall performance counter tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. Yes, I was planning to spin that up soon. The main remaining question is whether we want to switch the return type of all hypercalls (or at least the ones common to all archs) not requiring to return 64-bit values to "int", as Julien requested. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.2022 09:39, Juergen Gross wrote: > On 08.03.22 09:34, Jan Beulich wrote: >> On 08.12.2021 16:55, Juergen Gross wrote: >>> In order to avoid indirect function calls on the hypercall path as >>> much as possible this series is removing the hypercall function tables >>> and is replacing the hypercall handler calls via the function array >>> by automatically generated call macros. >>> >>> Another by-product of generating the call macros is the automatic >>> generating of the hypercall handler prototypes from the same data base >>> which is used to generate the macros. >>> >>> This has the additional advantage of using type safe calls of the >>> handlers and to ensure related handler (e.g. PV and HVM ones) share >>> the same prototypes. >>> >>> A very brief performance test (parallel build of the Xen hypervisor >>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of >>> the performance with the patches applied. The test was performed using >>> a PV and a PVH guest. >>> >>> Changes in V2: >>> - new patches 6, 14, 15 >>> - patch 7: support hypercall priorities for faster code >>> - comments addressed >>> >>> Changes in V3: >>> - patches 1 and 4 removed as already applied >>> - comments addressed >>> >>> Juergen Gross (13): >>>xen: move do_vcpu_op() to arch specific code >>>xen: harmonize return types of hypercall handlers >>>xen: don't include asm/hypercall.h from C sources >>>xen: include compat/platform.h from hypercall.h >>>xen: generate hypercall interface related code >>>xen: use generated prototypes for hypercall handlers >>>x86/pv-shim: don't modify hypercall table >>>xen/x86: don't use hypercall table for calling compat hypercalls >>>xen/x86: call hypercall handlers via generated macro >>>xen/arm: call hypercall handlers via generated macro >>>xen/x86: add hypercall performance counters for hvm, correct pv >>>xen: drop calls_to_multicall performance counter >>>tools/xenperf: update hypercall names >> >> As it's pretty certain now that parts of this which didn't go in yet will >> need re-basing, I'm going to drop this from my waiting-to-be-acked folder, >> expecting a v4 instead. > > Yes, I was planning to spin that up soon. > > The main remaining question is whether we want to switch the return type > of all hypercalls (or at least the ones common to all archs) not > requiring to return 64-bit values to "int", as Julien requested. Could you remind me of the (sub)thread this was in, to read through the justification again? Without recalling any details I guess I'd prefer to stick to long for non-compat flavors. Jan
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.22 09:50, Jan Beulich wrote: On 08.03.2022 09:39, Juergen Gross wrote: On 08.03.22 09:34, Jan Beulich wrote: On 08.12.2021 16:55, Juergen Gross wrote: In order to avoid indirect function calls on the hypercall path as much as possible this series is removing the hypercall function tables and is replacing the hypercall handler calls via the function array by automatically generated call macros. Another by-product of generating the call macros is the automatic generating of the hypercall handler prototypes from the same data base which is used to generate the macros. This has the additional advantage of using type safe calls of the handlers and to ensure related handler (e.g. PV and HVM ones) share the same prototypes. A very brief performance test (parallel build of the Xen hypervisor in a 6 vcpu guest) showed a very slim improvement (less than 1%) of the performance with the patches applied. The test was performed using a PV and a PVH guest. Changes in V2: - new patches 6, 14, 15 - patch 7: support hypercall priorities for faster code - comments addressed Changes in V3: - patches 1 and 4 removed as already applied - comments addressed Juergen Gross (13): xen: move do_vcpu_op() to arch specific code xen: harmonize return types of hypercall handlers xen: don't include asm/hypercall.h from C sources xen: include compat/platform.h from hypercall.h xen: generate hypercall interface related code xen: use generated prototypes for hypercall handlers x86/pv-shim: don't modify hypercall table xen/x86: don't use hypercall table for calling compat hypercalls xen/x86: call hypercall handlers via generated macro xen/arm: call hypercall handlers via generated macro xen/x86: add hypercall performance counters for hvm, correct pv xen: drop calls_to_multicall performance counter tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. Yes, I was planning to spin that up soon. The main remaining question is whether we want to switch the return type of all hypercalls (or at least the ones common to all archs) not requiring to return 64-bit values to "int", as Julien requested. Could you remind me of the (sub)thread this was in, to read through the justification again? Without recalling any details I guess I'd prefer to stick to long for non-compat flavors. This discussion started with: https://lists.xen.org/archives/html/xen-devel/2021-12/threads.html#01293 Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v4] vpci/msix: fix PBA accesses
On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote: > On 07.03.2022 17:37, Roger Pau Monne wrote: > > Map the PBA in order to access it from the MSI-X read and write > > handlers. Note that previously the handlers would pass the physical > > host address into the {read,write}{l,q} handlers, which is wrong as > > those expect a linear address. > > > > Map the PBA using ioremap when the first access is performed. Note > > that 32bit arches might want to abstract the call to ioremap into a > > vPCI arch handler, so they can use a fixmap range to map the PBA. > > > > Reported-by: Jan Beulich > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Jan Beulich > > > Cc: Alex Olson > > I'll wait a little with committing, in the hope for Alex to re-provide > a Tested-by. > > > --- a/xen/drivers/vpci/msix.c > > +++ b/xen/drivers/vpci/msix.c > > @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct > > vpci_msix *msix, > > return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE]; > > } > > > > +static void __iomem *get_pba(struct vpci *vpci) > > +{ > > +struct vpci_msix *msix = vpci->msix; > > +/* > > + * PBA will only be unmapped when the device is deassigned, so access > > it > > + * without holding the vpci lock. > > + */ > > +void __iomem *pba = read_atomic(&msix->pba); > > + > > +if ( likely(pba) ) > > +return pba; > > + > > +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA), > > + vmsix_table_size(vpci, VPCI_MSIX_PBA)); > > +if ( !pba ) > > +return read_atomic(&msix->pba); > > + > > +spin_lock(&vpci->lock); > > +if ( !msix->pba ) > > +{ > > +write_atomic(&msix->pba, pba); > > +spin_unlock(&vpci->lock); > > +} > > +else > > +{ > > +spin_unlock(&vpci->lock); > > +iounmap(pba); > > +} > > TBH I had been hoping for just a single spin_unlock(), but you're > the maintainer of this code ... Would you prefer something like: spin_lock(&vpci->lock); if ( !msix->pba ) write_atomic(&msix->pba, pba); spin_unlock(&vpci->lock); if ( read_atomic(&msix->pba) != pba ) iounmap(pba); ? Thanks, Roger.
[ovmf test] 168472: regressions - FAIL
flight 168472 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/168472/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 168254 build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6 baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z7 days Failing since168258 2022-03-01 01:55:31 Z7 days 76 attempts Testing same since 168469 2022-03-07 23:11:34 Z0 days2 attempts People who touched revisions under test: Gerd Hoffmann Guo Dong Guomin Jiang Hua Ma Jason Jason Lou Li, Zhihao Ma, Hua Matt DeVillier Patrick Rudolph Sean Rhodes Sebastien Boeuf Xiaolu.Jiang Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 468 lines long.)
[xen-unstable test] 168470: tolerable FAIL - PUSHED
flight 168470 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/168470/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168460 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168460 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168460 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168460 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168460 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168460 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168460 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168460 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168460 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168460 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168460 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168460 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen 9d4a44380d273de22d5753883cbf5581795ff24d baseline version: xen 4cb8d7a06752b368
Re: [PATCH v2] livepatch: set -f{function,data}-sections compiler option
Hi Roger, On 08/03/2022 08:13, Roger Pau Monné wrote: On Mon, Mar 07, 2022 at 05:19:53PM +, Julien Grall wrote: Hi Roger, On 07/03/2022 15:55, Roger Pau Monne wrote: If livepatching support is enabled build the hypervisor with -f{function,data}-sections compiler options, which is required by the livepatching tools to detect changes and create livepatches. This shouldn't result in any functional change on the hypervisor binary image, but does however require some changes in the linker script in order to handle that each function and data item will now be placed into its own section in object files. As a result add catch-all for .text, .data and .bss in order to merge each individual item section into the final image. The main difference will be that .text.startup will end up being part of .text rather than .init, and thus won't be freed. .text.exit will also be part of .text rather than dropped. Overall this could make the image bigger, and package some .text code in a sub-optimal way. Note that placement of the sections inside of .text is also slightly adjusted to be more similar to the position found in the default GNU ld linker script. This requires having a separate section for the header in order to place it at the begging of the output image, followed with the unlikely and related sections, and finally the main .text section. On Arm the .data.read_mostly needs to be moved ahead of the .data section like it's already done on x86, and the alignment needs to be set to PAGE_SIZE so it doesn't end up being placed at the tail of a read-only page from the previous section. While there move the alignment of the .data section ahead of the section declaration, like it's done for other sections. This sounds like a bug not related to this patch. Can this be split separately? No, .data.read_mostly needs to be moved before .data so that the catch all .data.* introduced to .data to account for -fdata-sections doesn't end up also including .data.read_mostly. Sorry I misread it. I thought you suggested the alignment was also wrong before this patch. Thanks for the clarification! The benefit of having CONFIG_LIVEPATCH enable those compiler option is that the livepatch build tools no longer need to fiddle with the build system in order to enable them. Note the current livepatch tools are broken after the recent build changes due to the way they attempt to set -f{function,data}-sections. Signed-off-by: Roger Pau Monné --- Changes since v1: - Introduce CC_SPLIT_SECTIONS for selecting the compiler options. - Drop check for compiler options, all supported versions have them. - Re-arrange section placement in .text, to match the default linker script. - Introduce .text.header to contain the headers bits that must appear first in the final binary. --- Given that now the header is explicitly placed in .text.header, it's likely possible to simplify some of the ordering of the object files for the prelink.o generation, as arch/x86/boot/built_in.o no longer needs to be the first object file in the list. It also seems on Arm the schedulers and hypfs .data sections should be moved into read_mostly. --- Tested by gitlab in order to assert I didn't introduce any regression on Arm specially. --- xen/Makefile | 2 ++ xen/arch/arm/arm32/head.S | 1 + xen/arch/arm/arm64/head.S | 1 + xen/arch/arm/xen.lds.S| 49 +-- xen/arch/x86/boot/head.S | 2 +- xen/arch/x86/xen.lds.S| 22 +++--- xen/common/Kconfig| 4 7 files changed, 49 insertions(+), 32 deletions(-) diff --git a/xen/Makefile b/xen/Makefile index 5c21492d6f..18a4f7e101 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -273,6 +273,8 @@ else CFLAGS += -fomit-frame-pointer endif +CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections + CFLAGS += -nostdinc -fno-builtin -fno-common CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith $(call cc-option-add,CFLAGS,CC,-Wvla) diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S index 7a906167ef..c837d3054c 100644 --- a/xen/arch/arm/arm32/head.S +++ b/xen/arch/arm/arm32/head.S @@ -120,6 +120,7 @@ #endif /* !CONFIG_EARLY_PRINTK */ +.section .text.header, "ax", %progbits .arm /* diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index 66d862fc81..e62c48ec1c 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -133,6 +133,7 @@ add \xb, \xb, x20 .endm +.section .text.header, "ax", %progbits /*.aarch64*/ /* diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index 08016948ab..836da880c3 100644 --- a/xen/arch/arm/xen.lds.S +++ b/xen/arch/arm/xen.lds.S @@ -30,9 +30,16 @@ SECTIONS _start = .; .text : { _stext = .;/* Text section */ + *(.text.header) With this change, the changes in arch/*
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: > As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final > binaries"), arbitrary sections appearing without our linker script > placing them explicitly can be a problem. Have the linker make us aware > of such sections, so we would know that the script needs adjusting. > > To deal with the resulting warnings: > - Retain .note.* explicitly for ELF, and discard all of them (except the > earlier consumed .note.gnu.build-id) for PE/COFF. > - Have explicit statements for .got, .plt, and alike and add assertions > that they're empty. No output sections will be created for these as > long as they remain empty (or else the assertions would cause early > failure anyway). > - Collect all .rela.* into a single section, with again an assertion > added for the resulting section to be empty. > - Extend the enumerating of .debug_* to ELF. Note that for Clang adding > of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart, > .debug_macro, then as well (albeit more may need adding for full > coverage). > > Suggested-by: Roger Pau Monné > Signed-off-by: Jan Beulich LGTM, just two questions. > --- > v2: Don't use (NOLOAD) for ELF; it has undocumented (and unhelpful) > behavior with GNU ld there. Also place .{sym,str,shstr}tab for LLVM > ld. > --- > I would have wanted to make this generic (by putting it in > xen/Makefile), but the option cannot be added to LDFLAGS, or else > there'll be a flood of warnings with $(LD) -r. (Besides, adding to > LDFLAGS would mean use of the option on every linker pass rather than > just the last one.) > > Retaining of .note in xen-syms is under question. Plus if we want to > retain all notes, the question is whether they wouldn't better go into > .init.rodata. But .note.gnu.build-id shouldn't move there, and when > notes are discontiguous all intermediate space will also be assigned to > the NOTE segment, thus making the contents useless for tools going just > by program headers. > > Newer Clang may require yet more .debug_* to be added. I've only played > with versions 5 and 7 so far. > > Unless we would finally drop all mentioning of Stabs sections, we may > want to extend to there what is done for Dwarf here (allowing the EFI > conditional around the section to also go away). > > See also https://sourceware.org/pipermail/binutils/2022-March/119922.html. > > --- a/xen/arch/x86/Makefile > +++ b/xen/arch/x86/Makefile > @@ -120,6 +120,8 @@ syms-warn-dup-y := --warn-dup > syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) := > syms-warn-dup-$(CONFIG_ENFORCE_UNIQUE_SYMBOLS) := --error-dup > > +orphan-handling-$(call ld-option,--orphan-handling=warn) += > --orphan-handling=warn > + > $(TARGET): TMP = $(@D)/.$(@F).elf32 > $(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32 > $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) > $(XEN_IMG_OFFSET) \ > @@ -146,7 +148,7 @@ $(TARGET)-syms: $(BASEDIR)/prelink.o $(o > >$(@D)/.$(@F).1.S > $(MAKE) $(build)=$(@D) $(@D)/.$(@F).1.o > $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \ > - $(@D)/.$(@F).1.o -o $@ > + $(orphan-handling-y) $(@D)/.$(@F).1.o -o $@ > $(NM) -pa --format=sysv $(@D)/$(@F) \ > | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv > --sort \ > >$(@D)/$(@F).map > @@ -220,7 +222,7 @@ endif > | $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort > >$(@D)/.$(@F).1s.S > $(MAKE) $(build)=$(@D) .$(@F).1r.o .$(@F).1s.o > $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T $(obj)/efi.lds -N $< \ > - $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(note_file_option) > -o $@ > + $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(orphan-handling-y) > $(note_file_option) -o $@ > $(NM) -pa --format=sysv $(@D)/$(@F) \ > | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv > --sort >$(@D)/$(@F).map > rm -f $(@D)/.$(@F).[0-9]* $(@D)/..$(@F).[0-9]* > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -12,6 +12,13 @@ > #undef __XEN_VIRT_START > #define __XEN_VIRT_START __image_base__ > #define DECL_SECTION(x) x : > +/* > + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld > + * for PE output, in order to record that we'd prefer these sections to not > + * be loaded into memory. > + */ > +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) } > +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) } > > ENTRY(efi_start) > > @@ -19,6 +26,8 @@ ENTRY(efi_start) > > #define FORMAT "elf64-x86-64" > #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START) > +#define DECL_DEBUG(x, a) #x 0 : { *(x) } > +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } Would it be helpful to place those in a > > ENTRY(start_pa) > > @@ -179,6 +188,13 @@ SECTIONS > #endif > #endif > > +#ifndef EFI > +
[libvirt test] 168473: regressions - FAIL
flight 168473 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/168473/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 34238d7408df4d9c83e7b00accfc310545165787 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 606 days Failing since151818 2020-07-11 04:18:52 Z 605 days 587 attempts Testing same since 168473 2022-03-08 04:18:53 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jing Qi Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Kim InSoo Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Praveen K Paladugu Richard W.M. Jones Ricky Tigg Robin Lee Rohit Kumar Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja
[PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
Xen enabled CET for supporting architectures. The control flow aspect of CET expects functions that can be called indirectly (i.e., via function pointers) to start with an ENDBR64 instruction. Otherwise a control flow exception is raised. This expectation breaks livepatching flows because we patch functions by overwriting their first 5 bytes with a JMP + , thus breaking the ENDBR64. We fix this by checking the start of a patched function for being ENDBR64. In the positive case we move the livepatch JMP to start behind the ENDBR64 instruction. To avoid having to guess the ENDBR64 offset again on patch reversal (which might race with other mechanisms adding/removing ENDBR dynamically), use the livepatch metadata to store the computed offset along with the saved bytes of the overwritten function. Signed-off-by: Bjoern Doebel CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall Note that on top of livepatching functions, Xen supports an additional mode where we can "remove" a function by overwriting it with NOPs. This is only supported for functions up to 31 bytes in size and this patch reduces this limit to 30 bytes. Changes since r1: * use sizeof_field() to avoid unused variable warning * make metadata variable const in arch_livepatch_revert --- xen/arch/x86/livepatch.c | 61 ++-- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 65530c1e57..0fd97f2a00 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -14,11 +14,29 @@ #include #include +#include #include #include #include #include +/* + * CET hotpatching support: We may have functions starting with an ENDBR64 + * instruction that MUST remain the first instruction of the function, hence + * we need to move any hotpatch trampoline further into the function. For that + * we need to keep track of the patching offset used for any loaded hotpatch + * (to avoid racing against other fixups adding/removing ENDBR64 or similar + * instructions). + * + * We do so by making use of the existing opaque metadata area. We use its + * first 4 bytes to track the offset into the function used for patching and + * the remainder of the data to store overwritten code bytes. + */ +struct x86_livepatch_meta { +uint8_t patch_offset; +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; +}; + static bool has_active_waitqueue(const struct vm_event_domain *ved) { /* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */ @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) int arch_livepatch_verify_func(const struct livepatch_func *func) { +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE); + /* If NOPing.. */ if ( !func->new_addr ) { /* Only do up to maximum amount we can put in the ->opaque. */ -if ( func->new_size > sizeof(func->opaque) ) +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, + instruction) ) return -EOPNOTSUPP; if ( func->old_size < func->new_size ) return -EINVAL; } -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) -return -EINVAL; +else +{ +/* + * Space needed now depends on whether the target function + * starts with an ENDBR64 instruction. + */ +uint8_t needed; + +needed = ARCH_PATCH_INSN_SIZE; +if ( is_endbr64(func->old_addr) ) +needed += ENDBR64_LEN; + +if ( func->old_size < needed ) +return -EINVAL; +} return 0; } @@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct livepatch_func *func) void noinline arch_livepatch_apply(struct livepatch_func *func) { uint8_t *old_ptr; -uint8_t insn[sizeof(func->opaque)]; +struct x86_livepatch_meta *lp; +uint8_t insn[sizeof(lp->instruction)]; unsigned int len; +lp = (struct x86_livepatch_meta *)func->opaque; +lp->patch_offset = 0; old_ptr = func->old_addr; len = livepatch_insn_len(func); if ( !len ) return; -memcpy(func->opaque, old_ptr, len); +if ( is_endbr64(old_ptr) ) +lp->patch_offset += ENDBR64_LEN; + +memcpy(lp->instruction, old_ptr + lp->patch_offset, len); if ( func->new_addr ) { int32_t val; @@ -143,14 +183,15 @@ void noinline arch_livepatch_apply(struct livepatch_func *func) BUILD_BUG_ON(ARCH_PATCH_INSN_SIZE != (1 + sizeof(val))); insn[0] = 0xe9; /* Relative jump. */ -val = func->new_addr - func->old_addr - ARCH_PATCH_INSN_SIZE; +val = func->new_addr - (func->old_addr + lp->patch_offset ++ ARCH_PATCH_INSN_SIZE); memcpy(&insn[1], &val, sizeof(val)); } else add_nops(insn, len); -memcpy(old_ptr, insn, len); +memcpy(old_ptr + lp->patch_offs
[ovmf test] 168475: regressions - FAIL
flight 168475 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/168475/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 168254 build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6 baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z7 days Failing since168258 2022-03-01 01:55:31 Z7 days 77 attempts Testing same since 168469 2022-03-07 23:11:34 Z0 days3 attempts People who touched revisions under test: Gerd Hoffmann Guo Dong Guomin Jiang Hua Ma Jason Jason Lou Li, Zhihao Ma, Hua Matt DeVillier Patrick Rudolph Sean Rhodes Sebastien Boeuf Xiaolu.Jiang Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 468 lines long.)
Re: [PATCH v5 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86
On Mon, Mar 07, 2022 at 03:06:08PM +, Jane Malalane wrote: > Add XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_xapic and > XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_x2apic to report accelerated xapic > and x2apic, on x86 hardware. I think the commit message has gone out of sync with the code, those should now be XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2,}APIC > No such features are currently implemented on AMD hardware. > > HW assisted xAPIC virtualization will be reported if HW, at the > minimum, supports virtualize_apic_accesses as this feature alone means > that an access to the APIC page will cause an APIC-access VM exit. An > APIC-access VM exit provides a VMM with information about the access > causing the VM exit, unlike a regular EPT fault, thus simplifying some > internal handling. > > HW assisted x2APIC virtualization will be reported if HW supports > virtualize_x2apic_mode and, at least, either apic_reg_virt or > virtual_intr_delivery. This is due to apic_reg_virt and > virtual_intr_delivery preventing a VM exit from occuring or at least > replacing a regular EPT fault VM-exit with an APIC-access VM-exit on For x2APIC there's no EPT fault, because x2APIC accesses are in the MSR space. I think you can omit this whole sentence and just mention that this now follows the conditionals in vmx_vlapic_msr_changed. > read and write APIC accesses, respectively. This also means that > sysctl follows the conditionals in vmx_vlapic_msr_changed(). > > For that purpose, also add an arch-specific "capabilities" parameter > to struct xen_sysctl_physinfo. > > Note that this interface is intended to be compatible with AMD so that > AVIC support can be introduced in a future patch. Unlike Intel that > has multiple controls for APIC Virtualization, AMD has one global > 'AVIC Enable' control bit, so fine-graining of APIC virtualization > control cannot be done on a common interface. > > Suggested-by: Andrew Cooper > Signed-off-by: Jane Malalane > --- > CC: Wei Liu > CC: Anthony PERARD > CC: Juergen Gross > CC: Andrew Cooper > CC: George Dunlap > CC: Jan Beulich > CC: Julien Grall > CC: Stefano Stabellini > CC: Volodymyr Babchuk > CC: Bertrand Marquis > CC: Jun Nakajima > CC: Kevin Tian > CC: "Roger Pau Monné" > > v5: > * Have assisted_xapic_available solely depend on > cpu_has_vmx_virtualize_apic_accesses and assisted_x2apic_available > depend on cpu_has_vmx_virtualize_x2apic_mode and > cpu_has_vmx_apic_reg_virt OR cpu_has_vmx_virtual_intr_delivery. > > v4: > * Fallback to the original v2/v1 conditions for setting >assisted_xapic_available and assisted_x2apic_available so that in >the future APIC virtualization can be exposed on AMD hardware >since fine-graining of "AVIC" is not supported, i.e., AMD solely >uses "AVIC Enable". This also means that sysctl mimics what's >exposed in CPUID. > > v3: > * Define XEN_SYSCTL_PHYSCAP_ARCH_MAX for ABI checking and actually >set "arch_capbilities", via a call to c_bitmap_to_ocaml_list() > * Have assisted_x2apic_available only depend on >cpu_has_vmx_virtualize_x2apic_mode > > v2: > * Use one macro LIBXL_HAVE_PHYSINFO_ASSISTED_APIC instead of two > * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo > * Set assisted_x{2}apic_available to be conditional upon "bsp" and >annotate it with __ro_after_init > * Change XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_X{2}APIC to >_X86_ASSISTED_X{2}APIC > * Keep XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2}APIC contained within >sysctl.h > * Fix padding introduced in struct xen_sysctl_physinfo and bump >XEN_SYSCTL_INTERFACE_VERSION > --- > tools/golang/xenlight/helpers.gen.go | 4 > tools/golang/xenlight/types.gen.go | 2 ++ > tools/include/libxl.h| 7 +++ > tools/libs/light/libxl.c | 3 +++ > tools/libs/light/libxl_arch.h| 4 > tools/libs/light/libxl_arm.c | 5 + > tools/libs/light/libxl_types.idl | 2 ++ > tools/libs/light/libxl_x86.c | 11 +++ > tools/ocaml/libs/xc/xenctrl.ml | 5 + > tools/ocaml/libs/xc/xenctrl.mli | 5 + > tools/ocaml/libs/xc/xenctrl_stubs.c | 14 +++--- > tools/xl/xl_info.c | 6 -- > xen/arch/x86/hvm/vmx/vmcs.c | 9 + > xen/arch/x86/include/asm/domain.h| 3 +++ > xen/arch/x86/sysctl.c| 7 +++ > xen/include/public/sysctl.h | 11 ++- > 16 files changed, 92 insertions(+), 6 deletions(-) > > diff --git a/tools/golang/xenlight/helpers.gen.go > b/tools/golang/xenlight/helpers.gen.go > index b746ff1081..dd4e6c9f14 100644 > --- a/tools/golang/xenlight/helpers.gen.go > +++ b/tools/golang/xenlight/helpers.gen.go > @@ -3373,6 +3373,8 @@ x.CapVmtrace = bool(xc.cap_vmtrace) > x.CapVpmu = bool(xc.cap_vpmu) > x.CapGnttabV1 = bool(xc.cap_gnttab_v1) > x.CapGnttabV2 = bool(xc.cap_gnttab_v2) > +x.CapAssistedXapic = bool(xc.cap_assisted_xapic) > +x.CapAssistedX2Apic = bool(xc.cap_assis
Re: [PATCH v4] vpci/msix: fix PBA accesses
On 08.03.2022 10:05, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote: >> On 07.03.2022 17:37, Roger Pau Monne wrote: >>> Map the PBA in order to access it from the MSI-X read and write >>> handlers. Note that previously the handlers would pass the physical >>> host address into the {read,write}{l,q} handlers, which is wrong as >>> those expect a linear address. >>> >>> Map the PBA using ioremap when the first access is performed. Note >>> that 32bit arches might want to abstract the call to ioremap into a >>> vPCI arch handler, so they can use a fixmap range to map the PBA. >>> >>> Reported-by: Jan Beulich >>> Signed-off-by: Roger Pau Monné >> >> Reviewed-by: Jan Beulich >> >>> Cc: Alex Olson >> >> I'll wait a little with committing, in the hope for Alex to re-provide >> a Tested-by. >> >>> --- a/xen/drivers/vpci/msix.c >>> +++ b/xen/drivers/vpci/msix.c >>> @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct >>> vpci_msix *msix, >>> return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE]; >>> } >>> >>> +static void __iomem *get_pba(struct vpci *vpci) >>> +{ >>> +struct vpci_msix *msix = vpci->msix; >>> +/* >>> + * PBA will only be unmapped when the device is deassigned, so access >>> it >>> + * without holding the vpci lock. >>> + */ >>> +void __iomem *pba = read_atomic(&msix->pba); >>> + >>> +if ( likely(pba) ) >>> +return pba; >>> + >>> +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA), >>> + vmsix_table_size(vpci, VPCI_MSIX_PBA)); >>> +if ( !pba ) >>> +return read_atomic(&msix->pba); >>> + >>> +spin_lock(&vpci->lock); >>> +if ( !msix->pba ) >>> +{ >>> +write_atomic(&msix->pba, pba); >>> +spin_unlock(&vpci->lock); >>> +} >>> +else >>> +{ >>> +spin_unlock(&vpci->lock); >>> +iounmap(pba); >>> +} >> >> TBH I had been hoping for just a single spin_unlock(), but you're >> the maintainer of this code ... > > Would you prefer something like: > > spin_lock(&vpci->lock); > if ( !msix->pba ) > write_atomic(&msix->pba, pba); > spin_unlock(&vpci->lock); > > if ( read_atomic(&msix->pba) != pba ) > iounmap(pba); This or (to avoid the re-read) spin_lock(&vpci->lock); if ( !msix->pba ) { write_atomic(&msix->pba, pba); pba = NULL; } spin_unlock(&vpci->lock); if ( pba ) iounmap(pba); Iirc we have similar constructs elsewhere in a few places. Jan
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On 08.03.2022 11:12, Roger Pau Monné wrote: > On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: >> As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final >> binaries"), arbitrary sections appearing without our linker script >> placing them explicitly can be a problem. Have the linker make us aware >> of such sections, so we would know that the script needs adjusting. >> >> To deal with the resulting warnings: >> - Retain .note.* explicitly for ELF, and discard all of them (except the >> earlier consumed .note.gnu.build-id) for PE/COFF. >> - Have explicit statements for .got, .plt, and alike and add assertions >> that they're empty. No output sections will be created for these as >> long as they remain empty (or else the assertions would cause early >> failure anyway). >> - Collect all .rela.* into a single section, with again an assertion >> added for the resulting section to be empty. >> - Extend the enumerating of .debug_* to ELF. Note that for Clang adding >> of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart, >> .debug_macro, then as well (albeit more may need adding for full >> coverage). >> >> Suggested-by: Roger Pau Monné >> Signed-off-by: Jan Beulich > > LGTM, just two questions. Sure, just that ... >> @@ -19,6 +26,8 @@ ENTRY(efi_start) >> >> #define FORMAT "elf64-x86-64" >> #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START) >> +#define DECL_DEBUG(x, a) #x 0 : { *(x) } >> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } > > Would it be helpful to place those in a ... you may have had a 3rd one? >> @@ -179,6 +188,13 @@ SECTIONS >> #endif >> #endif >> >> +#ifndef EFI >> + /* Retain these just for the purpose of possible analysis tools. */ >> + DECL_SECTION(.note) { >> + *(.note.*) >> + } PHDR(note) PHDR(text) > > Wouldn't it be enough to place it in the note program header? > > The buildid note is already placed in .rodata, so any remaining notes > don't need to be in a LOAD section? All the notes will be covered by the NOTE phdr. I had this much later in the script originally, but then the NOTE phdr covered large parts of .init.*. Clearly that yields invalid notes, which analysis (or simple dumping) tools wouldn't be happy about. We might be able to add 2nd NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than one, so changes there would likely be needed then (which I'd like to avoid for the moment). I'm also not sure in how far tools can be expected to look for multiple NOTE phdrs ... >> +#endif >> + >>_erodata = .; >> >>. = ALIGN(SECTION_ALIGN); >> @@ -266,6 +282,32 @@ SECTIONS >> __ctors_end = .; >>} PHDR(text) >> >> +#ifndef EFI >> + /* >> + * With --orphan-sections=warn (or =error) we need to handle certain >> linker >> + * generated sections. These are all expected to be empty; respective >> + * ASSERT()s can be found towards the end of this file. >> + */ >> + DECL_SECTION(.got) { >> + *(.got) >> + } PHDR(text) >> + DECL_SECTION(.got.plt) { >> + *(.got.plt) >> + } PHDR(text) >> + DECL_SECTION(.igot.plt) { >> + *(.igot.plt) >> + } PHDR(text) >> + DECL_SECTION(.iplt) { >> + *(.iplt) >> + } PHDR(text) >> + DECL_SECTION(.plt) { >> + *(.plt) >> + } PHDR(text) >> + DECL_SECTION(.rela) { >> + *(.rela.*) >> + } PHDR(text) > > Why do you need to explicitly place those in the text program header? I guess that's largely for consistency with all other directives. With the assertions that these need to be empty, we might get away without, as long as no linker would decide to set up another zero-size phdr for them. Jan
Re: [XEN v9 3/4] xen/arm64: io: Handle the abort due to access to stage1 translation table
Hi Julien, On 07/03/2022 23:59, Julien Grall wrote: Hi, On 07/03/2022 22:23, Ayan Kumar Halder wrote: On 07/03/2022 19:37, Julien Grall wrote: On 07/03/2022 14:27, Ayan Kumar Halder wrote: Hi Julien, Hi Ayan, Hi Julien, I need a bit of clarification to understand this. One clarification. On 04/03/2022 10:39, Julien Grall wrote: Hi Ayan, On 01/03/2022 12:40, Ayan Kumar Halder wrote: If the abort was caused due to access to stage1 translation table, Xen will assume that the stage1 translation table is in the non MMIO region. Reading this commit message again. I think you want to explain why we want to do that because, from my understanding, this is technically not forbidden by the Arm Arm. From the previous discussion, we want to do this because we can't easily handle such fault on emulated region (we have no away to the walker the value read). Sorry, Can you explain this a bit more ? Do you mean that if the page table is located in the emulated region, map_domain_page() (called from p2m_next_level()) will fail. For data abort with valid syndrome, you will have a register to write back the value read. When the data abort has s1ptw == 1, AFAICT, we have no information how to return the value. Do you mean that for s1ptw, we get an intermediate physical address ? if ( hpfar_is_valid(xabt.s1ptw, fsc) ) gpa = get_faulting_ipa(gva); If the IPA corresponds to an emulated region, then Xen can read the emulated address, but can't return the value to the guest OS. (I actually want to understand this very well). But for emulated region, shouldn't pages be already mapped for Xen to access them ? I am not sure which "pages" you are referring to here. The implementation of emulated regions is left to the discretion of Xen. This may be reading/writing to a buffer allocated by Xen or return a fixed value. It will try to resolve the translation fault. If it succeeds, it will return to the guest to retry the instruction. If not, then it means that the table is in MMIO region which is not expected by Xen. Thus, Xen will forward the abort to the guest. Signed-off-by: Ayan Kumar Halder --- Changelog :- v1..v8 - NA v9 - 1. Extracted this change from "[XEN v8 2/2] xen/arm64: io: Support instructions (for which ISS is not..." into a separate patch of its own. The reason being this is an existing bug in the codebase. xen/arch/arm/io.c | 11 +++ xen/arch/arm/traps.c | 12 +++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c index bea69ffb08..ebcb8ed548 100644 --- a/xen/arch/arm/io.c +++ b/xen/arch/arm/io.c @@ -128,6 +128,17 @@ void try_decode_instruction(const struct cpu_user_regs *regs, return; } + /* + * At this point, we know that the stage1 translation table is in the MMIO + * region. This is not expected by Xen and thus it forwards the abort to the We don't know that. We only know that there are no corresponding valid mapping in the P2M. So the address may be part of an emulated MMIO region or invalid. For both cases, we will want to send an abort. Furthermore, I would say "emulated MMIO region" rather than MMIO region because the P2M can also contain MMIO mapping (we usually call then "direct MMIO"). Can I say MMIO region (to indicate both emulated and direct) ? The reason being the assumption that stage1 page tables cannot be in the MMIO region. And thus, when check_p2m() is invoked, we do not invoke try_map_mmio(gaddr_to_gfn(gpa). See this snippet :- if ( xabt.s1ptw ) check_mmio_region = false; Thinking a bit more of this. I would actually drop this check. We don't need to decode the instruction, so I don't think there are much benefits here to restrict access for direct MMIO. Did I miss anything? Can Linux or any OS keep its page tables in the direct MMIO space ? If yes, then try_map_mmio() needs to be invoked to map the region, so that OS can access it. If not, then Xen needs to return abort because the OS may be behaving maliciously. I think what matters is whether the Arm Arm would or would not allow it. From what I can tell there are no such restriction. So we would need to be cautious to restrict it further than necessary. My understanding from previous discussion was that it does not make sense for linux or any OS to keep its page tables in any MMIO region (emulated or direct). Please correct me if mistaken. At the moment, none of the regions emulated by Xen could be used for page-tables. I am also not sure how we should handle such access if they arise. So it is more convenient to simply forbid them. Regarding direct MMIO, see above. Correct me if I am wrong, but it should not be a problem for Xen to deal with them. So while I agree this doesn't seem to make sense, the restriction seems unnecessary. So the behavior will be :- 1. If the stage1 trans
[ovmf test] 168477: regressions - FAIL
flight 168477 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/168477/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 168254 build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6 baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z8 days Failing since168258 2022-03-01 01:55:31 Z7 days 78 attempts Testing same since 168469 2022-03-07 23:11:34 Z0 days4 attempts People who touched revisions under test: Gerd Hoffmann Guo Dong Guomin Jiang Hua Ma Jason Jason Lou Li, Zhihao Ma, Hua Matt DeVillier Patrick Rudolph Sean Rhodes Sebastien Boeuf Xiaolu.Jiang Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 468 lines long.)
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: > Introduce a new per-domain creation x86 specific flag to > select whether hardware assisted virtualization should be used for > x{2}APIC. > > A per-domain option is added to xl in order to select the usage of > x{2}APIC hardware assisted virtualization, as well as a global > configuration option. > > Having all APIC interaction exit to Xen for emulation is slow and can > induce much overhead. Hardware can speed up x{2}APIC by decoding the > APIC access and providing a VM exit with a more specific exit reason > than a regular EPT fault or by altogether avoiding a VM exit. > > On the other hand, being able to disable x{2}APIC hardware assisted > virtualization can be useful for testing and debugging purposes. > > Note: vmx_install_vlapic_mapping doesn't require modifications > regardless of whether the guest has "Virtualize APIC accesses" enabled > or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long > as virtualize_apic_accesses is supported by the CPU. > > Suggested-by: Andrew Cooper > Signed-off-by: Jane Malalane > --- > CC: Wei Liu > CC: Anthony PERARD > CC: Juergen Gross > CC: Andrew Cooper > CC: George Dunlap > CC: Jan Beulich > CC: Julien Grall > CC: Stefano Stabellini > CC: Christian Lindig > CC: David Scott > CC: Volodymyr Babchuk > CC: "Roger Pau Monné" > > v5: > * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of > the has_assisted_x{2}apic macros > * Following changes in assisted_x{2}apic_available definitions in > patch 1, retighten conditionals for setting > XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in > cpuid_hypervisor_leaves() > > v4: > * Add has_assisted_x{2}apic macros and use them where appropriate > * Replace CPU checks with per-domain assisted_x{2}apic control >options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(), >following edits to assisted_x{2}apic_available definitions in >patch 1 >Note: new assisted_x{2}apic_available definitions make later >cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery >checks redundant in vmx_vlapic_msr_changed() > > v3: > * Change info in xl.cfg to better express reality and fix >capitalization of x{2}apic > * Move "physinfo" variable definition to the beggining of >libxl__domain_build_info_setdefault() > * Reposition brackets in if statement to match libxl coding style > * Shorten logic in libxl__arch_domain_build_info_setdefault() > * Correct dprintk message in arch_sanitise_domain_config() > * Make appropriate changes in vmx_vlapic_msr_changed() and >cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit > * Remove unneeded parantheses > > v2: > * Add a LIBXL_HAVE_ASSISTED_APIC macro > * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo > * Add a return statement in now "int" >libxl__arch_domain_build_info_setdefault() > * Preserve libxl__arch_domain_build_info_setdefault 's location in >libxl_create.c > * Correct x{2}apic default setting logic in >libxl__arch_domain_prepare_config() > * Correct logic for parsing assisted_x{2}apic host/guest options in >xl_parse.c and initialize them to -1 in xl.c > * Use guest options directly in vmx_vlapic_msr_changed > * Fix indentation of bool assisted_x{2}apic in struct hvm_domain > * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks > --- > docs/man/xl.cfg.5.pod.in| 19 +++ > docs/man/xl.conf.5.pod.in | 12 > tools/golang/xenlight/helpers.gen.go| 12 > tools/golang/xenlight/types.gen.go | 2 ++ > tools/include/libxl.h | 7 +++ > tools/libs/light/libxl_arch.h | 5 +++-- > tools/libs/light/libxl_arm.c| 7 +-- > tools/libs/light/libxl_create.c | 22 +- > tools/libs/light/libxl_types.idl| 2 ++ > tools/libs/light/libxl_x86.c| 28 ++-- > tools/ocaml/libs/xc/xenctrl.ml | 2 ++ > tools/ocaml/libs/xc/xenctrl.mli | 2 ++ > tools/ocaml/libs/xc/xenctrl_stubs.c | 2 +- > tools/xl/xl.c | 8 > tools/xl/xl.h | 2 ++ > tools/xl/xl_parse.c | 16 > xen/arch/x86/domain.c | 28 +++- > xen/arch/x86/hvm/vmx/vmcs.c | 4 > xen/arch/x86/hvm/vmx/vmx.c | 13 - > xen/arch/x86/include/asm/hvm/domain.h | 6 ++ > xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 3 +++ > xen/arch/x86/traps.c| 9 + > xen/include/public/arch-x86/xen.h | 2 ++ > 23 files changed, 183 insertions(+), 30 deletions(-) > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > index b98d161398..dcca564a23 100644 > --- a/docs/man/xl.cfg.5.pod.in > +++
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote: > On 08.03.2022 11:12, Roger Pau Monné wrote: > > On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: > >> As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final > >> binaries"), arbitrary sections appearing without our linker script > >> placing them explicitly can be a problem. Have the linker make us aware > >> of such sections, so we would know that the script needs adjusting. > >> > >> To deal with the resulting warnings: > >> - Retain .note.* explicitly for ELF, and discard all of them (except the > >> earlier consumed .note.gnu.build-id) for PE/COFF. > >> - Have explicit statements for .got, .plt, and alike and add assertions > >> that they're empty. No output sections will be created for these as > >> long as they remain empty (or else the assertions would cause early > >> failure anyway). > >> - Collect all .rela.* into a single section, with again an assertion > >> added for the resulting section to be empty. > >> - Extend the enumerating of .debug_* to ELF. Note that for Clang adding > >> of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart, > >> .debug_macro, then as well (albeit more may need adding for full > >> coverage). > >> > >> Suggested-by: Roger Pau Monné > >> Signed-off-by: Jan Beulich > > > > LGTM, just two questions. > > Sure, just that ... > > >> @@ -19,6 +26,8 @@ ENTRY(efi_start) > >> > >> #define FORMAT "elf64-x86-64" > >> #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START) > >> +#define DECL_DEBUG(x, a) #x 0 : { *(x) } > >> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } > > > > Would it be helpful to place those in a > > ... you may have had a 3rd one? Oh, no, I just forgot to remove this. I was going to ask whether we could place those in something akin to a PT_NOLOAD program section, but that doesn't exist AFAICT (and even if possible would require adjustments to mkelf32). > > >> @@ -179,6 +188,13 @@ SECTIONS > >> #endif > >> #endif > >> > >> +#ifndef EFI > >> + /* Retain these just for the purpose of possible analysis tools. */ > >> + DECL_SECTION(.note) { > >> + *(.note.*) > >> + } PHDR(note) PHDR(text) > > > > Wouldn't it be enough to place it in the note program header? > > > > The buildid note is already placed in .rodata, so any remaining notes > > don't need to be in a LOAD section? > > All the notes will be covered by the NOTE phdr. I had this much later > in the script originally, but then the NOTE phdr covered large parts of > .init.*. Clearly that yields invalid notes, which analysis (or simple > dumping) tools wouldn't be happy about. We might be able to add 2nd > NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than > one, so changes there would likely be needed then (which I'd like to > avoid for the moment). I'm also not sure in how far tools can be > expected to look for multiple NOTE phdrs ... But if we are adding a .note section now we might as well merge it with .note.gnu.build-id: DECL_SECTION(.note) { __note_gnu_build_id_start = .; *(.note.gnu.build-id) __note_gnu_build_id_end = .; *(.note.*) } PHDR(note) PHDR(text) And drop the .note.Xen section? > >> +#endif > >> + > >>_erodata = .; > >> > >>. = ALIGN(SECTION_ALIGN); > >> @@ -266,6 +282,32 @@ SECTIONS > >> __ctors_end = .; > >>} PHDR(text) > >> > >> +#ifndef EFI > >> + /* > >> + * With --orphan-sections=warn (or =error) we need to handle certain > >> linker > >> + * generated sections. These are all expected to be empty; respective > >> + * ASSERT()s can be found towards the end of this file. > >> + */ > >> + DECL_SECTION(.got) { > >> + *(.got) > >> + } PHDR(text) > >> + DECL_SECTION(.got.plt) { > >> + *(.got.plt) > >> + } PHDR(text) > >> + DECL_SECTION(.igot.plt) { > >> + *(.igot.plt) > >> + } PHDR(text) > >> + DECL_SECTION(.iplt) { > >> + *(.iplt) > >> + } PHDR(text) > >> + DECL_SECTION(.plt) { > >> + *(.plt) > >> + } PHDR(text) > >> + DECL_SECTION(.rela) { > >> + *(.rela.*) > >> + } PHDR(text) > > > > Why do you need to explicitly place those in the text program header? > > I guess that's largely for consistency with all other directives. With the > assertions that these need to be empty, we might get away without, as long > as no linker would decide to set up another zero-size phdr for them. We already set the debug sections to not be part of any program header and seem to get away with it. I'm not sure how different the sections handled below would be, linkers might indeed want to place them regardless? If so it might be good to add a comment that while those should be empty (and thus don't end up in any program header) we assign them to the text one in order to avoid the linker from creating a new program header for them. Thanks, Roger.
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08.03.2022 12:38, Roger Pau Monné wrote: > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: >> @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct >> xen_domctl_createdomain *config) >> } >> } >> >> -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED ) >> +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | >> + XEN_X86_ASSISTED_XAPIC | >> + XEN_X86_ASSISTED_X2APIC) ) >> { >> dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n", >> config->arch.misc_flags); >> return -EINVAL; >> } >> >> +if ( (assisted_xapic || assisted_x2apic) && !hvm ) >> +{ >> +dprintk(XENLOG_INFO, >> +"Interrupt Controller Virtualization not supported for >> PV\n"); >> +return -EINVAL; >> +} >> + >> +if ( (assisted_xapic && !assisted_xapic_available) || >> + (assisted_x2apic && !assisted_x2apic_available) ) >> +{ >> +dprintk(XENLOG_INFO, >> +"Hardware assisted x%sAPIC requested but not available\n", >> +assisted_xapic && !assisted_xapic_available ? "" : "2"); >> +return -EINVAL; > > I think for those two you could return -ENODEV if others agree. If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this and the earlier if()), then I agree. I'm always in favor of using distinct error codes when possible and at least halfway sensible. Jan
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote: > On 08.03.2022 12:38, Roger Pau Monné wrote: > > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: > >> @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct > >> xen_domctl_createdomain *config) > >> } > >> } > >> > >> -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED ) > >> +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | > >> + XEN_X86_ASSISTED_XAPIC | > >> + XEN_X86_ASSISTED_X2APIC) ) > >> { > >> dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n", > >> config->arch.misc_flags); > >> return -EINVAL; > >> } > >> > >> +if ( (assisted_xapic || assisted_x2apic) && !hvm ) > >> +{ > >> +dprintk(XENLOG_INFO, > >> +"Interrupt Controller Virtualization not supported for > >> PV\n"); > >> +return -EINVAL; > >> +} > >> + > >> +if ( (assisted_xapic && !assisted_xapic_available) || > >> + (assisted_x2apic && !assisted_x2apic_available) ) > >> +{ > >> +dprintk(XENLOG_INFO, > >> +"Hardware assisted x%sAPIC requested but not available\n", > >> +assisted_xapic && !assisted_xapic_available ? "" : "2"); > >> +return -EINVAL; > > > > I think for those two you could return -ENODEV if others agree. > > If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this > and the earlier if()), then I agree. I'm always in favor of using distinct > error codes when possible and at least halfway sensible. I would be fine by using it for the !hvm if also. IMO it makes sense as PV doesn't have an APIC 'device' at all, so ENODEV would seem fitting. EINVAL is also fine as the caller shouldn't even attempt that in the first place. So let's use it for the last if only. Thanks, Roger.
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On 08.03.2022 13:11, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote: >> On 08.03.2022 11:12, Roger Pau Monné wrote: >>> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: @@ -179,6 +188,13 @@ SECTIONS #endif #endif +#ifndef EFI + /* Retain these just for the purpose of possible analysis tools. */ + DECL_SECTION(.note) { + *(.note.*) + } PHDR(note) PHDR(text) >>> >>> Wouldn't it be enough to place it in the note program header? >>> >>> The buildid note is already placed in .rodata, so any remaining notes >>> don't need to be in a LOAD section? >> >> All the notes will be covered by the NOTE phdr. I had this much later >> in the script originally, but then the NOTE phdr covered large parts of >> .init.*. Clearly that yields invalid notes, which analysis (or simple >> dumping) tools wouldn't be happy about. We might be able to add 2nd >> NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than >> one, so changes there would likely be needed then (which I'd like to >> avoid for the moment). I'm also not sure in how far tools can be >> expected to look for multiple NOTE phdrs ... > > But if we are adding a .note section now we might as well merge it > with .note.gnu.build-id: > > DECL_SECTION(.note) { >__note_gnu_build_id_start = .; >*(.note.gnu.build-id) >__note_gnu_build_id_end = .; >*(.note.*) > } PHDR(note) PHDR(text) > > And drop the .note.Xen section? In an ideal world we likely could, yes. But do we know for sure that nothing recognizes the Xen notes by section name? .note.gnu.build-id cannot be folded in any event - see the rule for generating note.o, to be used by xen.efi linking in certain cases. +#endif + _erodata = .; . = ALIGN(SECTION_ALIGN); @@ -266,6 +282,32 @@ SECTIONS __ctors_end = .; } PHDR(text) +#ifndef EFI + /* + * With --orphan-sections=warn (or =error) we need to handle certain linker + * generated sections. These are all expected to be empty; respective + * ASSERT()s can be found towards the end of this file. + */ + DECL_SECTION(.got) { + *(.got) + } PHDR(text) + DECL_SECTION(.got.plt) { + *(.got.plt) + } PHDR(text) + DECL_SECTION(.igot.plt) { + *(.igot.plt) + } PHDR(text) + DECL_SECTION(.iplt) { + *(.iplt) + } PHDR(text) + DECL_SECTION(.plt) { + *(.plt) + } PHDR(text) + DECL_SECTION(.rela) { + *(.rela.*) + } PHDR(text) >>> >>> Why do you need to explicitly place those in the text program header? >> >> I guess that's largely for consistency with all other directives. With the >> assertions that these need to be empty, we might get away without, as long >> as no linker would decide to set up another zero-size phdr for them. > > We already set the debug sections to not be part of any program header > and seem to get away with it. I'm not sure how different the sections > handled below would be, linkers might indeed want to place them > regardless? Simply because I don't know I'd like to be on the safe side. Debug sections can't really be taken as reference: At least GNU ld heavily special-cases them anyway. > If so it might be good to add a comment that while those should be > empty (and thus don't end up in any program header) we assign them to > the text one in order to avoid the linker from creating a new program > header for them. I'll add a sentence to the comment I'm already adding here. Jan
Re: [PATCH v4] vpci/msix: fix PBA accesses
On Tue, Mar 08, 2022 at 11:46:20AM +0100, Jan Beulich wrote: > On 08.03.2022 10:05, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 09:31:34AM +0100, Jan Beulich wrote: > >> On 07.03.2022 17:37, Roger Pau Monne wrote: > >>> Map the PBA in order to access it from the MSI-X read and write > >>> handlers. Note that previously the handlers would pass the physical > >>> host address into the {read,write}{l,q} handlers, which is wrong as > >>> those expect a linear address. > >>> > >>> Map the PBA using ioremap when the first access is performed. Note > >>> that 32bit arches might want to abstract the call to ioremap into a > >>> vPCI arch handler, so they can use a fixmap range to map the PBA. > >>> > >>> Reported-by: Jan Beulich > >>> Signed-off-by: Roger Pau Monné > >> > >> Reviewed-by: Jan Beulich > >> > >>> Cc: Alex Olson > >> > >> I'll wait a little with committing, in the hope for Alex to re-provide > >> a Tested-by. > >> > >>> --- a/xen/drivers/vpci/msix.c > >>> +++ b/xen/drivers/vpci/msix.c > >>> @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct > >>> vpci_msix *msix, > >>> return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE]; > >>> } > >>> > >>> +static void __iomem *get_pba(struct vpci *vpci) > >>> +{ > >>> +struct vpci_msix *msix = vpci->msix; > >>> +/* > >>> + * PBA will only be unmapped when the device is deassigned, so > >>> access it > >>> + * without holding the vpci lock. > >>> + */ > >>> +void __iomem *pba = read_atomic(&msix->pba); > >>> + > >>> +if ( likely(pba) ) > >>> +return pba; > >>> + > >>> +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA), > >>> + vmsix_table_size(vpci, VPCI_MSIX_PBA)); > >>> +if ( !pba ) > >>> +return read_atomic(&msix->pba); > >>> + > >>> +spin_lock(&vpci->lock); > >>> +if ( !msix->pba ) > >>> +{ > >>> +write_atomic(&msix->pba, pba); > >>> +spin_unlock(&vpci->lock); > >>> +} > >>> +else > >>> +{ > >>> +spin_unlock(&vpci->lock); > >>> +iounmap(pba); > >>> +} > >> > >> TBH I had been hoping for just a single spin_unlock(), but you're > >> the maintainer of this code ... > > > > Would you prefer something like: > > > > spin_lock(&vpci->lock); > > if ( !msix->pba ) > > write_atomic(&msix->pba, pba); > > spin_unlock(&vpci->lock); > > > > if ( read_atomic(&msix->pba) != pba ) > > iounmap(pba); > > This or (to avoid the re-read) > > spin_lock(&vpci->lock); > if ( !msix->pba ) > { > write_atomic(&msix->pba, pba); > pba = NULL; > } > spin_unlock(&vpci->lock); > > if ( pba ) > iounmap(pba); > > Iirc we have similar constructs elsewhere in a few places. I don't really have a strong opinion. I agree the duplicated spin_unlock() call is not nice, but I think the logic is easier to follow by using a single if ... else ... section. Feel free to adjust at commit, or else I can resend if you prefer. Thanks, Roger.
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08/03/2022 10:29, Bjoern Doebel wrote: > @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) > > int arch_livepatch_verify_func(const struct livepatch_func *func) > { > +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE); > + > /* If NOPing.. */ > if ( !func->new_addr ) > { > /* Only do up to maximum amount we can put in the ->opaque. */ > -if ( func->new_size > sizeof(func->opaque) ) > +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, > + instruction) ) > return -EOPNOTSUPP; > > if ( func->old_size < func->new_size ) > return -EINVAL; > } > -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) > -return -EINVAL; > +else > +{ > +/* > + * Space needed now depends on whether the target function > + * starts with an ENDBR64 instruction. > + */ > +uint8_t needed; > + > +needed = ARCH_PATCH_INSN_SIZE; > +if ( is_endbr64(func->old_addr) ) > +needed += ENDBR64_LEN; This won't work for cf_clobber targets, I don't think. The ENDBR gets converted to NOP4 and fails this check, but the altcalls calling old_func had their displacements adjusted by +4. The is_endbr64() check will fail, and the 5-byte jmp will be written at the start of the function, and corrupt the instruction stream for the altcall()'d callers. Let me write an incremental patch to help. ~Andrew
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.2022 09:39, Juergen Gross wrote: > On 08.03.22 09:34, Jan Beulich wrote: >> On 08.12.2021 16:55, Juergen Gross wrote: >>> In order to avoid indirect function calls on the hypercall path as >>> much as possible this series is removing the hypercall function tables >>> and is replacing the hypercall handler calls via the function array >>> by automatically generated call macros. >>> >>> Another by-product of generating the call macros is the automatic >>> generating of the hypercall handler prototypes from the same data base >>> which is used to generate the macros. >>> >>> This has the additional advantage of using type safe calls of the >>> handlers and to ensure related handler (e.g. PV and HVM ones) share >>> the same prototypes. >>> >>> A very brief performance test (parallel build of the Xen hypervisor >>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of >>> the performance with the patches applied. The test was performed using >>> a PV and a PVH guest. >>> >>> Changes in V2: >>> - new patches 6, 14, 15 >>> - patch 7: support hypercall priorities for faster code >>> - comments addressed >>> >>> Changes in V3: >>> - patches 1 and 4 removed as already applied >>> - comments addressed >>> >>> Juergen Gross (13): >>>xen: move do_vcpu_op() to arch specific code >>>xen: harmonize return types of hypercall handlers >>>xen: don't include asm/hypercall.h from C sources >>>xen: include compat/platform.h from hypercall.h >>>xen: generate hypercall interface related code >>>xen: use generated prototypes for hypercall handlers >>>x86/pv-shim: don't modify hypercall table >>>xen/x86: don't use hypercall table for calling compat hypercalls >>>xen/x86: call hypercall handlers via generated macro >>>xen/arm: call hypercall handlers via generated macro >>>xen/x86: add hypercall performance counters for hvm, correct pv >>>xen: drop calls_to_multicall performance counter >>>tools/xenperf: update hypercall names >> >> As it's pretty certain now that parts of this which didn't go in yet will >> need re-basing, I'm going to drop this from my waiting-to-be-acked folder, >> expecting a v4 instead. > > Yes, I was planning to spin that up soon. > > The main remaining question is whether we want to switch the return type > of all hypercalls (or at least the ones common to all archs) not > requiring to return 64-bit values to "int", as Julien requested. After walking through the earlier discussion (Jürgen - thanks for the link) I'm inclined to say that if Arm wants their return values limited to 32 bits (with exceptions where needed), so be it. But on x86 I'd rather not see us change this aspect. Of course I'd much prefer if architectures didn't diverge in this regard, yet then again Arm has already diverged in avoiding the compat layer (in this case I view the divergence as helpful, though, as it avoids unnecessary headache). Jan
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.22 13:50, Jan Beulich wrote: On 08.03.2022 09:39, Juergen Gross wrote: On 08.03.22 09:34, Jan Beulich wrote: On 08.12.2021 16:55, Juergen Gross wrote: In order to avoid indirect function calls on the hypercall path as much as possible this series is removing the hypercall function tables and is replacing the hypercall handler calls via the function array by automatically generated call macros. Another by-product of generating the call macros is the automatic generating of the hypercall handler prototypes from the same data base which is used to generate the macros. This has the additional advantage of using type safe calls of the handlers and to ensure related handler (e.g. PV and HVM ones) share the same prototypes. A very brief performance test (parallel build of the Xen hypervisor in a 6 vcpu guest) showed a very slim improvement (less than 1%) of the performance with the patches applied. The test was performed using a PV and a PVH guest. Changes in V2: - new patches 6, 14, 15 - patch 7: support hypercall priorities for faster code - comments addressed Changes in V3: - patches 1 and 4 removed as already applied - comments addressed Juergen Gross (13): xen: move do_vcpu_op() to arch specific code xen: harmonize return types of hypercall handlers xen: don't include asm/hypercall.h from C sources xen: include compat/platform.h from hypercall.h xen: generate hypercall interface related code xen: use generated prototypes for hypercall handlers x86/pv-shim: don't modify hypercall table xen/x86: don't use hypercall table for calling compat hypercalls xen/x86: call hypercall handlers via generated macro xen/arm: call hypercall handlers via generated macro xen/x86: add hypercall performance counters for hvm, correct pv xen: drop calls_to_multicall performance counter tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. Yes, I was planning to spin that up soon. The main remaining question is whether we want to switch the return type of all hypercalls (or at least the ones common to all archs) not requiring to return 64-bit values to "int", as Julien requested. After walking through the earlier discussion (Jürgen - thanks for the link) I'm inclined to say that if Arm wants their return values limited to 32 bits (with exceptions where needed), so be it. But on x86 I'd rather not see us change this aspect. Of course I'd much prefer if architectures didn't diverge in this regard, yet then again Arm has already diverged in avoiding the compat layer (in this case I view the divergence as helpful, though, as it avoids unnecessary headache). How to handle this in common code then? Have a hypercall_ret_t type (exact naming TBD) which is defined as long on x86 and int on Arm? Or use long in the handlers and check the value on Arm side to be a valid 32-bit signed int (this would be cumbersome for the exceptions, though)? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[ovmf test] 168478: regressions - FAIL
flight 168478 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/168478/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf 62fa37fe7b9df3c54a2d9d90aed9ff0e817ee0c6 baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z8 days Failing since168258 2022-03-01 01:55:31 Z7 days 79 attempts Testing same since 168469 2022-03-07 23:11:34 Z0 days5 attempts People who touched revisions under test: Gerd Hoffmann Guo Dong Guomin Jiang Hua Ma Jason Jason Lou Li, Zhihao Ma, Hua Matt DeVillier Patrick Rudolph Sean Rhodes Sebastien Boeuf Xiaolu.Jiang Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 468 lines long.)
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On Tue, Mar 08, 2022 at 12:44:54PM +, Andrew Cooper wrote: > On 08/03/2022 10:29, Bjoern Doebel wrote: > > @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) > > > > int arch_livepatch_verify_func(const struct livepatch_func *func) > > { > > +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != > > LIVEPATCH_OPAQUE_SIZE); > > + > > /* If NOPing.. */ > > if ( !func->new_addr ) > > { > > /* Only do up to maximum amount we can put in the ->opaque. */ > > -if ( func->new_size > sizeof(func->opaque) ) > > +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, > > + instruction) ) > > return -EOPNOTSUPP; > > > > if ( func->old_size < func->new_size ) > > return -EINVAL; > > } > > -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) > > -return -EINVAL; > > +else > > +{ > > +/* > > + * Space needed now depends on whether the target function > > + * starts with an ENDBR64 instruction. > > + */ > > +uint8_t needed; > > + > > +needed = ARCH_PATCH_INSN_SIZE; > > +if ( is_endbr64(func->old_addr) ) > > +needed += ENDBR64_LEN; > > This won't work for cf_clobber targets, I don't think. The ENDBR gets > converted to NOP4 and fails this check, but the altcalls calling > old_func had their displacements adjusted by +4. > > The is_endbr64() check will fail, and the 5-byte jmp will be written at > the start of the function, and corrupt the instruction stream for the > altcall()'d callers. > > Let me write an incremental patch to help. Please add Acked-by: Konrad Rzeszutek Wilk on the patches. Thank you > > ~Andrew
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.22 13:44, Andrew Cooper wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 08/03/2022 10:29, Bjoern Doebel wrote: @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) int arch_livepatch_verify_func(const struct livepatch_func *func) { +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE); + /* If NOPing.. */ if ( !func->new_addr ) { /* Only do up to maximum amount we can put in the ->opaque. */ -if ( func->new_size > sizeof(func->opaque) ) +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, + instruction) ) return -EOPNOTSUPP; if ( func->old_size < func->new_size ) return -EINVAL; } -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) -return -EINVAL; +else +{ +/* + * Space needed now depends on whether the target function + * starts with an ENDBR64 instruction. + */ +uint8_t needed; + +needed = ARCH_PATCH_INSN_SIZE; +if ( is_endbr64(func->old_addr) ) +needed += ENDBR64_LEN; This won't work for cf_clobber targets, I don't think. The ENDBR gets converted to NOP4 and fails this check, but the altcalls calling old_func had their displacements adjusted by +4. The is_endbr64() check will fail, and the 5-byte jmp will be written at the start of the function, and corrupt the instruction stream for the altcall()'d callers. Let me write an incremental patch to help. Thanks. Will you be adding a memcmp(func->old_addr, ideal_nops[4], 4) or is that once more too naive? Bjoern ~Andrew Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.22 14:06, Konrad Rzeszutek Wilk wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Tue, Mar 08, 2022 at 12:44:54PM +, Andrew Cooper wrote: On 08/03/2022 10:29, Bjoern Doebel wrote: @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) int arch_livepatch_verify_func(const struct livepatch_func *func) { +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE); + /* If NOPing.. */ if ( !func->new_addr ) { /* Only do up to maximum amount we can put in the ->opaque. */ -if ( func->new_size > sizeof(func->opaque) ) +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, + instruction) ) return -EOPNOTSUPP; if ( func->old_size < func->new_size ) return -EINVAL; } -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) -return -EINVAL; +else +{ +/* + * Space needed now depends on whether the target function + * starts with an ENDBR64 instruction. + */ +uint8_t needed; + +needed = ARCH_PATCH_INSN_SIZE; +if ( is_endbr64(func->old_addr) ) +needed += ENDBR64_LEN; This won't work for cf_clobber targets, I don't think. The ENDBR gets converted to NOP4 and fails this check, but the altcalls calling old_func had their displacements adjusted by +4. The is_endbr64() check will fail, and the 5-byte jmp will be written at the start of the function, and corrupt the instruction stream for the altcall()'d callers. Let me write an incremental patch to help. Please add Acked-by: Konrad Rzeszutek Wilk on the patches. Thanks, will do! Bjoern Thank you ~Andrew Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH 1/4] livepatch: update readme to mention --xen-depends
> From: Roger Pau Monne > Sent: Wednesday, March 2, 2022 2:27 PM > To: xen-devel@lists.xenproject.org > Cc: Ross Lagerwall ; konrad.w...@oracle.com > ; doe...@amazon.de ; jul...@xen.org > ; Andrew Cooper ; Roger Pau Monne > > Subject: [PATCH 1/4] livepatch: update readme to mention --xen-depends > > Fixes: b19df7b2c05e ('livepatch-build: Embed hypervisor build id into every > hotpatch') > Signed-off-by: Roger Pau Monné > --- > README.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/README.md b/README.md > index 653c624..b48a3df 100644 > --- a/README.md > +++ b/README.md > @@ -24,7 +24,7 @@ Next, build a live patch, using a patch and the source, > build ID, and > ``` > $ cd ~/src/livepatch-build > $ ./livepatch-build -s ~/src/xenbuild -p ~/src/xsa.patch -o out \ > --c ~/src/xen/xen/.config --depends $BUILDID > +-c ~/src/xen/xen/.config --depends $BUILDID --xen-depends $BUILDID > Building LivePatch patch: xsa > > Xen directory: /home/ross/src/xenbuild > -- > 2.34.1 Reviewed-by: Ross Lagerwall
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.2022 13:56, Juergen Gross wrote: > On 08.03.22 13:50, Jan Beulich wrote: >> On 08.03.2022 09:39, Juergen Gross wrote: >>> On 08.03.22 09:34, Jan Beulich wrote: On 08.12.2021 16:55, Juergen Gross wrote: > In order to avoid indirect function calls on the hypercall path as > much as possible this series is removing the hypercall function tables > and is replacing the hypercall handler calls via the function array > by automatically generated call macros. > > Another by-product of generating the call macros is the automatic > generating of the hypercall handler prototypes from the same data base > which is used to generate the macros. > > This has the additional advantage of using type safe calls of the > handlers and to ensure related handler (e.g. PV and HVM ones) share > the same prototypes. > > A very brief performance test (parallel build of the Xen hypervisor > in a 6 vcpu guest) showed a very slim improvement (less than 1%) of > the performance with the patches applied. The test was performed using > a PV and a PVH guest. > > Changes in V2: > - new patches 6, 14, 15 > - patch 7: support hypercall priorities for faster code > - comments addressed > > Changes in V3: > - patches 1 and 4 removed as already applied > - comments addressed > > Juergen Gross (13): > xen: move do_vcpu_op() to arch specific code > xen: harmonize return types of hypercall handlers > xen: don't include asm/hypercall.h from C sources > xen: include compat/platform.h from hypercall.h > xen: generate hypercall interface related code > xen: use generated prototypes for hypercall handlers > x86/pv-shim: don't modify hypercall table > xen/x86: don't use hypercall table for calling compat hypercalls > xen/x86: call hypercall handlers via generated macro > xen/arm: call hypercall handlers via generated macro > xen/x86: add hypercall performance counters for hvm, correct pv > xen: drop calls_to_multicall performance counter > tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. >>> >>> Yes, I was planning to spin that up soon. >>> >>> The main remaining question is whether we want to switch the return type >>> of all hypercalls (or at least the ones common to all archs) not >>> requiring to return 64-bit values to "int", as Julien requested. >> >> After walking through the earlier discussion (Jürgen - thanks for the link) >> I'm inclined to say that if Arm wants their return values limited to 32 bits >> (with exceptions where needed), so be it. But on x86 I'd rather not see us >> change this aspect. Of course I'd much prefer if architectures didn't >> diverge in this regard, yet then again Arm has already diverged in avoiding >> the compat layer (in this case I view the divergence as helpful, though, as >> it avoids unnecessary headache). > > How to handle this in common code then? Have a hypercall_ret_t type > (exact naming TBD) which is defined as long on x86 and int on Arm? > Or use long in the handlers and check the value on Arm side to be a > valid 32-bit signed int (this would be cumbersome for the exceptions, > though)? I was thinking along the lines of hypercall_ret_t, yes, but the compiler wouldn't be helping with spotting truncation issues (we can't reasonably enable the respective warnings, as they would trigger all over the place). If we were to go that route, we'd rely on an initial audit and subsequent patch review to spot issues. Therefore, cumbersome or not, the checking approach may be the more viable one. Then again Julien may have a better plan in mind; I'd anyway expect him to supply details on how he thinks such a transition could be done safely, as he was the one to request limiting to 32 bits. Jan
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.22 14:42, Jan Beulich wrote: On 08.03.2022 13:56, Juergen Gross wrote: On 08.03.22 13:50, Jan Beulich wrote: On 08.03.2022 09:39, Juergen Gross wrote: On 08.03.22 09:34, Jan Beulich wrote: On 08.12.2021 16:55, Juergen Gross wrote: In order to avoid indirect function calls on the hypercall path as much as possible this series is removing the hypercall function tables and is replacing the hypercall handler calls via the function array by automatically generated call macros. Another by-product of generating the call macros is the automatic generating of the hypercall handler prototypes from the same data base which is used to generate the macros. This has the additional advantage of using type safe calls of the handlers and to ensure related handler (e.g. PV and HVM ones) share the same prototypes. A very brief performance test (parallel build of the Xen hypervisor in a 6 vcpu guest) showed a very slim improvement (less than 1%) of the performance with the patches applied. The test was performed using a PV and a PVH guest. Changes in V2: - new patches 6, 14, 15 - patch 7: support hypercall priorities for faster code - comments addressed Changes in V3: - patches 1 and 4 removed as already applied - comments addressed Juergen Gross (13): xen: move do_vcpu_op() to arch specific code xen: harmonize return types of hypercall handlers xen: don't include asm/hypercall.h from C sources xen: include compat/platform.h from hypercall.h xen: generate hypercall interface related code xen: use generated prototypes for hypercall handlers x86/pv-shim: don't modify hypercall table xen/x86: don't use hypercall table for calling compat hypercalls xen/x86: call hypercall handlers via generated macro xen/arm: call hypercall handlers via generated macro xen/x86: add hypercall performance counters for hvm, correct pv xen: drop calls_to_multicall performance counter tools/xenperf: update hypercall names As it's pretty certain now that parts of this which didn't go in yet will need re-basing, I'm going to drop this from my waiting-to-be-acked folder, expecting a v4 instead. Yes, I was planning to spin that up soon. The main remaining question is whether we want to switch the return type of all hypercalls (or at least the ones common to all archs) not requiring to return 64-bit values to "int", as Julien requested. After walking through the earlier discussion (Jürgen - thanks for the link) I'm inclined to say that if Arm wants their return values limited to 32 bits (with exceptions where needed), so be it. But on x86 I'd rather not see us change this aspect. Of course I'd much prefer if architectures didn't diverge in this regard, yet then again Arm has already diverged in avoiding the compat layer (in this case I view the divergence as helpful, though, as it avoids unnecessary headache). How to handle this in common code then? Have a hypercall_ret_t type (exact naming TBD) which is defined as long on x86 and int on Arm? Or use long in the handlers and check the value on Arm side to be a valid 32-bit signed int (this would be cumbersome for the exceptions, though)? I was thinking along the lines of hypercall_ret_t, yes, but the compiler wouldn't be helping with spotting truncation issues (we can't reasonably enable the respective warnings, as they would trigger all over the place). If we were to go that route, we'd rely on an initial audit and subsequent patch review to spot issues. Therefore, cumbersome or not, the checking approach may be the more viable one. Then again Julien may have a better plan in mind; I'd anyway expect him to supply details on how he thinks such a transition could be done safely, as he was the one to request limiting to 32 bits. In order to have some progress I could just leave the Arm side alone in my series. It could be added later if a solution has been agreed on. What do you think? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 2/4] livepatch: improve rune for fetching of Build ID
> From: Roger Pau Monne > Sent: Wednesday, March 2, 2022 2:27 PM > To: xen-devel@lists.xenproject.org > Cc: Ross Lagerwall ; konrad.w...@oracle.com > ; doe...@amazon.de ; jul...@xen.org > ; Andrew Cooper ; Roger Pau Monne > ; Roger Pau Monné > Subject: [PATCH 2/4] livepatch: improve rune for fetching of Build ID > > The current one is broken with my version of readelf and returns > 'NT_GNU_BUILD_ID'. > > Signed-off-by: Roger Pau Monné > --- > README.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/README.md b/README.md > index b48a3df..948a7de 100644 > --- a/README.md > +++ b/README.md > @@ -16,7 +16,7 @@ $ cp -r ~/src/xen ~/src/xenbuild > $ cd ~/src/xen/xen > $ make nconfig # Make sure to set CONFIG_LIVEPATCH=y > $ make > -$ BUILDID=$(readelf -Wn xen-syms | awk '/Build ID:/ {print $3}') > +$ BUILDID=$(readelf -Wn xen-syms | sed -n -e 's/^.*Build ID: //p') > ``` > > Next, build a live patch, using a patch and the source, build ID, and > -- > 2.34.1 Reviewed-by: Ross Lagerwall
[PATCH v3 0/2] livepatch: enable -f{function,data}-sections compiler option
Hello, The content in v3 has been split in two patches, but is still mostly the same. The main difference is that first patch does a bit of cleanup of the build logic now that the header object file doesn't need to be the first one passed to the linker script. Thanks, Roger. Roger Pau Monne (2): xen/build: put image header into a separate section livepatch: set -f{function,data}-sections compiler option xen/Kconfig | 4 +++ xen/Makefile| 2 ++ xen/arch/arm/arch.mk| 2 -- xen/arch/arm/arm32/Makefile | 3 +-- xen/arch/arm/arm32/head.S | 1 + xen/arch/arm/arm64/Makefile | 3 +-- xen/arch/arm/arm64/head.S | 1 + xen/arch/arm/xen.lds.S | 49 - xen/arch/x86/Makefile | 5 ++-- xen/arch/x86/arch.mk| 2 -- xen/arch/x86/boot/head.S| 2 +- xen/arch/x86/xen.lds.S | 22 ++--- xen/common/Kconfig | 1 + 13 files changed, 54 insertions(+), 43 deletions(-) -- 2.34.1
[PATCH v3 1/2] xen/build: put image header into a separate section
So it can be explicitly placed ahead of the rest of the .text content in the linker script (and thus the resulting image). This is a prerequisite for further work that will add a catch-all to the text section (.text.*). Note that placement of the sections inside of .text is also slightly adjusted to be more similar to the position found in the default GNU ld linker script. The special handling of the object file containing the header data as the first object file passed to the linker command line can also be removed. While there also remove the special handling of efi/ on x86. There's no need for the resulting object file to be passed in any special order to the linker. Signed-off-by: Roger Pau Monné --- Changes since v2: - New in this version (split from patch 2). --- xen/arch/arm/arch.mk| 2 -- xen/arch/arm/arm32/Makefile | 3 +-- xen/arch/arm/arm32/head.S | 1 + xen/arch/arm/arm64/Makefile | 3 +-- xen/arch/arm/arm64/head.S | 1 + xen/arch/arm/xen.lds.S | 8 ++-- xen/arch/x86/Makefile | 5 ++--- xen/arch/x86/arch.mk| 2 -- xen/arch/x86/boot/head.S| 2 +- xen/arch/x86/xen.lds.S | 13 - 10 files changed, 21 insertions(+), 19 deletions(-) diff --git a/xen/arch/arm/arch.mk b/xen/arch/arm/arch.mk index 094b670723..58db76c4e1 100644 --- a/xen/arch/arm/arch.mk +++ b/xen/arch/arm/arch.mk @@ -23,5 +23,3 @@ ifeq ($(CONFIG_ARM64_ERRATUM_843419),y) LDFLAGS += --fix-cortex-a53-843419 endif endif - -ALL_OBJS-y := arch/arm/$(TARGET_SUBARCH)/head.o $(ALL_OBJS-y) diff --git a/xen/arch/arm/arm32/Makefile b/xen/arch/arm/arm32/Makefile index 3040eabce3..520fb42054 100644 --- a/xen/arch/arm/arm32/Makefile +++ b/xen/arch/arm/arm32/Makefile @@ -4,11 +4,10 @@ obj-$(CONFIG_EARLY_PRINTK) += debug.o obj-y += domctl.o obj-y += domain.o obj-y += entry.o +obj-y += head.o obj-y += insn.o obj-$(CONFIG_LIVEPATCH) += livepatch.o obj-y += proc-v7.o proc-caxx.o obj-y += smpboot.o obj-y += traps.o obj-y += vfp.o - -extra-y += head.o diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S index 7a906167ef..c837d3054c 100644 --- a/xen/arch/arm/arm32/head.S +++ b/xen/arch/arm/arm32/head.S @@ -120,6 +120,7 @@ #endif /* !CONFIG_EARLY_PRINTK */ +.section .text.header, "ax", %progbits .arm /* diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile index baa87655fa..6d507da0d4 100644 --- a/xen/arch/arm/arm64/Makefile +++ b/xen/arch/arm/arm64/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_EARLY_PRINTK) += debug.o obj-y += domctl.o obj-y += domain.o obj-y += entry.o +obj-y += head.o obj-y += insn.o obj-$(CONFIG_LIVEPATCH) += livepatch.o obj-y += smc.o @@ -14,5 +15,3 @@ obj-y += smpboot.o obj-y += traps.o obj-y += vfp.o obj-y += vsysreg.o - -extra-y += head.o diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index 66d862fc81..e62c48ec1c 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -133,6 +133,7 @@ add \xb, \xb, x20 .endm +.section .text.header, "ax", %progbits /*.aarch64*/ /* diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index 08016948ab..47d09d6cf1 100644 --- a/xen/arch/arm/xen.lds.S +++ b/xen/arch/arm/xen.lds.S @@ -30,9 +30,13 @@ SECTIONS _start = .; .text : { _stext = .;/* Text section */ + *(.text.header) + + *(.text.cold .text.cold.*) + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text) - *(.text.cold) - *(.text.unlikely) + *(.fixup) *(.gnu.warning) _etext = .; /* End of text section */ diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile index 9c40e0b4d7..04065a7310 100644 --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -1,5 +1,7 @@ obj-y += acpi/ +obj-y += boot/ obj-y += cpu/ +obj-y += efi/ obj-y += genapic/ obj-$(CONFIG_GUEST) += guest/ obj-$(CONFIG_HVM) += hvm/ @@ -77,9 +79,6 @@ obj-$(CONFIG_COMPAT) += x86_64/platform_hypercall.o obj-y += sysctl.o endif -# Allows "clean" to descend into boot/ -subdir- += boot - extra-y += asm-macros.i extra-y += xen.lds diff --git a/xen/arch/x86/arch.mk b/xen/arch/x86/arch.mk index 8e57476d65..c90e56aeab 100644 --- a/xen/arch/x86/arch.mk +++ b/xen/arch/x86/arch.mk @@ -117,5 +117,3 @@ endif # Set up the assembler include path properly for older toolchains. CFLAGS += -Wa,-I$(srctree)/include - -ALL_OBJS-y := arch/x86/boot/built_in.o arch/x86/efi/built_in.o $(ALL_OBJS-y) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index dd1bea0d10..92d73345f0 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -9,7 +9,7 @@ #include #include -.text +.section .text.header, "ax", @progbits .code32 #define sym_offs(sym) ((sym) - __XEN_VIRT_START) diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 506bc8e404..715452aad9 100644 --- a/xen/arch/x
[PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
If livepatching support is enabled build the hypervisor with -f{function,data}-sections compiler options, which is required by the livepatching tools to detect changes and create livepatches. This shouldn't result in any functional change on the hypervisor binary image, but does however require some changes in the linker script in order to handle that each function and data item will now be placed into its own section in object files. As a result add catch-all for .text, .data and .bss in order to merge each individual item section into the final image. The main difference will be that .text.startup will end up being part of .text rather than .init, and thus won't be freed. .text.exit will also be part of .text rather than dropped. Overall this could make the image bigger, and package some .text code in a sub-optimal way. On Arm the .data.read_mostly needs to be moved ahead of the .data section like it's already done on x86, so the .data.* catch-all doesn't also include .data.read_mostly. The alignment of .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end up being placed at the tail of a read-only page from the previous section. While there move the alignment of the .data section ahead of the section declaration, like it's done for other sections. The benefit of having CONFIG_LIVEPATCH enable those compiler option is that the livepatch build tools no longer need to fiddle with the build system in order to enable them. Note the current livepatch tools are broken after the recent build changes due to the way they attempt to set -f{function,data}-sections. Signed-off-by: Roger Pau Monné --- Changes since v2: - Split the placing of the header code in a separate section to a pre-patch. - Move Kconfig option to xen/Kconfig. - Expand reasoning why .data.read_mostly needs to be moved on Arm. Changes since v1: - Introduce CC_SPLIT_SECTIONS for selecting the compiler options. - Drop check for compiler options, all supported versions have them. - Re-arrange section placement in .text, to match the default linker script. - Introduce .text.header to contain the headers bits that must appear first in the final binary. --- It seems on Arm the schedulers and hypfs .data sections should be moved into read_mostly. --- Tested by gitlab in order to assert I didn't introduce any regression on Arm specially. --- xen/Kconfig| 4 xen/Makefile | 2 ++ xen/arch/arm/xen.lds.S | 41 + xen/arch/x86/xen.lds.S | 9 + xen/common/Kconfig | 1 + 5 files changed, 33 insertions(+), 24 deletions(-) diff --git a/xen/Kconfig b/xen/Kconfig index bcbd2758e5..d134397a0b 100644 --- a/xen/Kconfig +++ b/xen/Kconfig @@ -27,6 +27,10 @@ config CLANG_VERSION config CC_HAS_VISIBILITY_ATTRIBUTE def_bool $(cc-option,-fvisibility=hidden) +# Use -f{function,data}-sections compiler parameters +config CC_SPLIT_SECTIONS + bool + source "arch/$(SRCARCH)/Kconfig" config DEFCONFIG_LIST diff --git a/xen/Makefile b/xen/Makefile index 5c21492d6f..18a4f7e101 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -273,6 +273,8 @@ else CFLAGS += -fomit-frame-pointer endif +CFLAGS-$(CONFIG_CC_SPLIT_SECTIONS) += -ffunction-sections -fdata-sections + CFLAGS += -nostdinc -fno-builtin -fno-common CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith $(call cc-option-add,CFLAGS,CC,-Wvla) diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index 47d09d6cf1..836da880c3 100644 --- a/xen/arch/arm/xen.lds.S +++ b/xen/arch/arm/xen.lds.S @@ -36,6 +36,9 @@ SECTIONS *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text) +#ifdef CONFIG_CC_SPLIT_SECTIONS + *(.text.*) +#endif *(.fixup) *(.gnu.warning) @@ -82,10 +85,24 @@ SECTIONS #endif _erodata = .;/* End of read-only data */ + . = ALIGN(PAGE_SIZE); + .data.read_mostly : { + /* Exception table */ + __start___ex_table = .; + *(.ex_table) + __stop___ex_table = .; + + /* Pre-exception table */ + __start___pre_ex_table = .; + *(.ex_table.pre) + __stop___pre_ex_table = .; + + *(.data.read_mostly) + } :text + + . = ALIGN(SMP_CACHE_BYTES); .data : {/* Data */ - . = ALIGN(PAGE_SIZE); *(.data.page_aligned) - *(.data) . = ALIGN(8); __start_schedulers_array = .; *(.data.schedulers) @@ -98,26 +115,10 @@ SECTIONS __paramhypfs_end = .; #endif - *(.data.rel) - *(.data.rel.*) + *(.data .data.*) CONSTRUCTORS } :text - . = ALIGN(SMP_CACHE_BYTES); - .data.read_mostly : { - /* Exception table */ - __start___ex_table = .; - *(.ex_table) - __stop___ex_table = .; - - /* Pre-exception table */ - __start___pre_ex_table = .; - *(.ex_table.pre) - __stop___pre_ex_table = .; - - *(.data.read_mostly) - } :text - . = ALI
Re: [PATCH 3/4] livepatch: do the initial build using CROSS_COMPILE
> From: Roger Pau Monne > Sent: Wednesday, March 2, 2022 2:27 PM > To: xen-devel@lists.xenproject.org > Cc: Ross Lagerwall ; konrad.w...@oracle.com > ; doe...@amazon.de ; jul...@xen.org > ; Andrew Cooper ; Roger Pau Monne > > Subject: [PATCH 3/4] livepatch: do the initial build using CROSS_COMPILE > > Setting it afterwards for further builds will cause the build logic to > detect a change and thus force a rebuild of all sources. > > Signed-off-by: Roger Pau Monné > --- > livepatch-build | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/livepatch-build b/livepatch-build > index e1715ea..38a92be 100755 > --- a/livepatch-build > +++ b/livepatch-build > @@ -92,7 +92,6 @@ function build_special() > cd "${SRCDIR}" || die > > # Capture .o files from the patched build > -export CROSS_COMPILE="${TOOLSDIR}/livepatch-gcc " > export LIVEPATCH_BUILD_DIR="$(pwd)/" > export LIVEPATCH_CAPTURE_DIR="$OUTPUT/${name}" > mkdir -p "$LIVEPATCH_CAPTURE_DIR" > @@ -408,6 +407,8 @@ if [ "${SKIP}" != "build" ]; then > XEN_DEBUG="debug=$XEN_DEBUG" > fi > > +export CROSS_COMPILE="${TOOLSDIR}/livepatch-gcc " > + > echo "Perform full initial build with ${CPUS} CPU(s)..." > build_full > > -- > 2.34.1 Reviewed-by: Ross Lagerwall
Re: [PATCH v3 00/13] xen: drop hypercall function tables
On 08.03.2022 14:44, Juergen Gross wrote: > On 08.03.22 14:42, Jan Beulich wrote: >> On 08.03.2022 13:56, Juergen Gross wrote: >>> On 08.03.22 13:50, Jan Beulich wrote: On 08.03.2022 09:39, Juergen Gross wrote: > On 08.03.22 09:34, Jan Beulich wrote: >> On 08.12.2021 16:55, Juergen Gross wrote: >>> In order to avoid indirect function calls on the hypercall path as >>> much as possible this series is removing the hypercall function tables >>> and is replacing the hypercall handler calls via the function array >>> by automatically generated call macros. >>> >>> Another by-product of generating the call macros is the automatic >>> generating of the hypercall handler prototypes from the same data base >>> which is used to generate the macros. >>> >>> This has the additional advantage of using type safe calls of the >>> handlers and to ensure related handler (e.g. PV and HVM ones) share >>> the same prototypes. >>> >>> A very brief performance test (parallel build of the Xen hypervisor >>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of >>> the performance with the patches applied. The test was performed using >>> a PV and a PVH guest. >>> >>> Changes in V2: >>> - new patches 6, 14, 15 >>> - patch 7: support hypercall priorities for faster code >>> - comments addressed >>> >>> Changes in V3: >>> - patches 1 and 4 removed as already applied >>> - comments addressed >>> >>> Juergen Gross (13): >>> xen: move do_vcpu_op() to arch specific code >>> xen: harmonize return types of hypercall handlers >>> xen: don't include asm/hypercall.h from C sources >>> xen: include compat/platform.h from hypercall.h >>> xen: generate hypercall interface related code >>> xen: use generated prototypes for hypercall handlers >>> x86/pv-shim: don't modify hypercall table >>> xen/x86: don't use hypercall table for calling compat hypercalls >>> xen/x86: call hypercall handlers via generated macro >>> xen/arm: call hypercall handlers via generated macro >>> xen/x86: add hypercall performance counters for hvm, correct pv >>> xen: drop calls_to_multicall performance counter >>> tools/xenperf: update hypercall names >> >> As it's pretty certain now that parts of this which didn't go in yet will >> need re-basing, I'm going to drop this from my waiting-to-be-acked >> folder, >> expecting a v4 instead. > > Yes, I was planning to spin that up soon. > > The main remaining question is whether we want to switch the return type > of all hypercalls (or at least the ones common to all archs) not > requiring to return 64-bit values to "int", as Julien requested. After walking through the earlier discussion (Jürgen - thanks for the link) I'm inclined to say that if Arm wants their return values limited to 32 bits (with exceptions where needed), so be it. But on x86 I'd rather not see us change this aspect. Of course I'd much prefer if architectures didn't diverge in this regard, yet then again Arm has already diverged in avoiding the compat layer (in this case I view the divergence as helpful, though, as it avoids unnecessary headache). >>> >>> How to handle this in common code then? Have a hypercall_ret_t type >>> (exact naming TBD) which is defined as long on x86 and int on Arm? >>> Or use long in the handlers and check the value on Arm side to be a >>> valid 32-bit signed int (this would be cumbersome for the exceptions, >>> though)? >> >> I was thinking along the lines of hypercall_ret_t, yes, but the >> compiler wouldn't be helping with spotting truncation issues (we can't >> reasonably enable the respective warnings, as they would trigger all >> over the place). If we were to go that route, we'd rely on an initial >> audit and subsequent patch review to spot issues. Therefore, >> cumbersome or not, the checking approach may be the more viable one. >> >> Then again Julien may have a better plan in mind; I'd anyway expect >> him to supply details on how he thinks such a transition could be done >> safely, as he was the one to request limiting to 32 bits. > > In order to have some progress I could just leave the Arm side alone > in my series. It could be added later if a solution has been agreed > on. > > What do you think? I see no issue with this if there's no other dependency on Arm following suit. Jan
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On 08.03.2022 14:49, Roger Pau Monne wrote: > So it can be explicitly placed ahead of the rest of the .text content > in the linker script (and thus the resulting image). This is a > prerequisite for further work that will add a catch-all to the text > section (.text.*). > > Note that placement of the sections inside of .text is also slightly > adjusted to be more similar to the position found in the default GNU > ld linker script. > > The special handling of the object file containing the header data as > the first object file passed to the linker command line can also be > removed. > > While there also remove the special handling of efi/ on x86. There's > no need for the resulting object file to be passed in any special > order to the linker. > > Signed-off-by: Roger Pau Monné Looks good to me, but I have one question before feeling ready to offer R-b: > @@ -86,8 +84,13 @@ SECTIONS > *(.text.kexec) /* Page aligned in the object file. */ > kexec_reloc_end = .; > > - *(.text.cold) > - *(.text.unlikely) > + *(.text.cold .text.cold.*) > + *(.text.unlikely .text.*_unlikely .text.unlikely.*) What generates .text.*_unlikely? And if anything really does, why would .text.cold not have a similar equivalent? Jan
Re: [PATCH 4/4] livepatch: differentiate between old and new build systems
> From: Roger Pau Monne > Sent: Wednesday, March 2, 2022 2:27 PM > To: xen-devel@lists.xenproject.org > Cc: Ross Lagerwall ; konrad.w...@oracle.com > ; doe...@amazon.de ; jul...@xen.org > ; Andrew Cooper ; Roger Pau Monne > > Subject: [PATCH 4/4] livepatch: differentiate between old and new build > systems > > Do not attempt to modify the build system if CFLAGS are not set in > Rules.mk, and instead rely on CONFIG_LIVEPATCH already setting > -f{function,data}-sections. > > Signed-off-by: Roger Pau Monné > --- > This depends on getting the patch to add -f{function,data}-sections > when using CONFIG_LIVEPATCH accepted. > --- > livepatch-build | 22 ++ > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/livepatch-build b/livepatch-build > index 38a92be..656cdac 100755 > --- a/livepatch-build > +++ b/livepatch-build > @@ -98,14 +98,20 @@ function build_special() > > # Build with special GCC flags > cd "${SRCDIR}/xen" || die > -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections > -fdata-sections/' Rules.mk > -cp -p arch/x86/Makefile arch/x86/Makefile.bak > -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' > arch/x86/Makefile > -# Restore timestamps to prevent spurious rebuilding > -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || die > -sed -i 's/CFLAGS += -nostdinc -ffunction-sections -fdata-sections/CFLAGS > += -nostdinc/' Rules.mk > -mv -f arch/x86/Makefile.bak arch/x86/Makefile > +if grep -q 'nostdinc' Rules.mk; then > + # Support for old build system, attempt to set > -f{function,data}-sections and rebuild > +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc > -ffunction-sections -fdata-sections/' Rules.mk > +cp -p arch/x86/Makefile arch/x86/Makefile.bak > +sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' > arch/x86/Makefile > +# Restore timestamps to prevent spurious rebuilding > +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > || die > +sed -i 's/CFLAGS += -nostdinc -ffunction-sections > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk > +mv -f arch/x86/Makefile.bak arch/x86/Makefile > +else > +# -f{function,data}-sections set by CONFIG_LIVEPATCH > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > || die > +fi > > unset LIVEPATCH_BUILD_DIR > unset LIVEPATCH_CAPTURE_DIR > -- > 2.34.1 Reviewed-by: Ross Lagerwall
[PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
For livepatching, we need to look at a potentially clobbered function and determine whether it used to have an ENDBR64 instruction. Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce the was_endbr64() predicate. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu CC: Bjoern Doebel CC: Michael Kurth CC: Martin Pohlack Bjoern: For the livepatching code, I think you want: if ( is_endbr64(...) || was_endbr64(...) ) needed += ENDBR64_LEN; --- xen/arch/x86/alternative.c | 10 +- xen/arch/x86/include/asm/endbr.h | 12 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c index d41eeef1bcaf..ffb1b1d960c8 100644 --- a/xen/arch/x86/alternative.c +++ b/xen/arch/x86/alternative.c @@ -362,7 +362,15 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start, if ( !is_kernel_text(ptr) || !is_endbr64(ptr) ) continue; -add_nops(ptr, ENDBR64_LEN); +/* + * Can't use add_nops() here. ENDBR64_POISON is specifically + * different to NOP4 so it can be spotted after the fact. + * + * All CET-capable hardware uses P6 NOPS (no need to plumb through + * ideal_nops), and doesn't require a branch to synchronise the + * instruction stream. + */ +memcpy(ptr, ENDBR64_POISON, ENDBR64_LEN); clobbered++; } diff --git a/xen/arch/x86/include/asm/endbr.h b/xen/arch/x86/include/asm/endbr.h index 6090afeb0bd8..5e1e55cb467d 100644 --- a/xen/arch/x86/include/asm/endbr.h +++ b/xen/arch/x86/include/asm/endbr.h @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr) *(uint32_t *)ptr = gen_endbr64(); } +/* + * After clobbering ENDBR64, we may need to confirm that the site used to + * contain an ENDBR64 instruction. Use an encoding which isn't the default + * P6_NOP4. + */ +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */ + +static inline bool was_endbr64(const void *ptr) +{ +return *(const uint32_t *)ptr == 0x001f0f66; +} + #endif /* XEN_ASM_ENDBR_H */ -- 2.11.0
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On Tue, Mar 08, 2022 at 01:34:06PM +0100, Jan Beulich wrote: > On 08.03.2022 13:11, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote: > >> On 08.03.2022 11:12, Roger Pau Monné wrote: > >>> On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: > @@ -179,6 +188,13 @@ SECTIONS > #endif > #endif > > +#ifndef EFI > + /* Retain these just for the purpose of possible analysis tools. */ > + DECL_SECTION(.note) { > + *(.note.*) > + } PHDR(note) PHDR(text) > >>> > >>> Wouldn't it be enough to place it in the note program header? > >>> > >>> The buildid note is already placed in .rodata, so any remaining notes > >>> don't need to be in a LOAD section? > >> > >> All the notes will be covered by the NOTE phdr. I had this much later > >> in the script originally, but then the NOTE phdr covered large parts of > >> .init.*. Clearly that yields invalid notes, which analysis (or simple > >> dumping) tools wouldn't be happy about. We might be able to add 2nd > >> NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than > >> one, so changes there would likely be needed then (which I'd like to > >> avoid for the moment). I'm also not sure in how far tools can be > >> expected to look for multiple NOTE phdrs ... > > > > But if we are adding a .note section now we might as well merge it > > with .note.gnu.build-id: > > > > DECL_SECTION(.note) { > >__note_gnu_build_id_start = .; > >*(.note.gnu.build-id) > >__note_gnu_build_id_end = .; > >*(.note.*) > > } PHDR(note) PHDR(text) > > > > And drop the .note.Xen section? > > In an ideal world we likely could, yes. But do we know for sure that > nothing recognizes the Xen notes by section name? Wouldn't that be wrong? In the elfnotes.h file it's clearly specified that Xen notes live in a PT_NOTE program header and have 'Xen' in the name field. There's no requirement of them being in any specific section. > .note.gnu.build-id > cannot be folded in any event - see the rule for generating note.o, > to be used by xen.efi linking in certain cases. Right, so we need to keep the .note.gnu.build-id section, but we could likely fold .note.Xen into .note I think? Or at least add a comment to mention that we don't want to fold .note.Xen into .note in case there are tools that search for specific Xen notes to be contained in .note.Xen. > +#endif > + > _erodata = .; > > . = ALIGN(SECTION_ALIGN); > @@ -266,6 +282,32 @@ SECTIONS > __ctors_end = .; > } PHDR(text) > > +#ifndef EFI > + /* > + * With --orphan-sections=warn (or =error) we need to handle certain > linker > + * generated sections. These are all expected to be empty; respective > + * ASSERT()s can be found towards the end of this file. > + */ > + DECL_SECTION(.got) { > + *(.got) > + } PHDR(text) > + DECL_SECTION(.got.plt) { > + *(.got.plt) > + } PHDR(text) > + DECL_SECTION(.igot.plt) { > + *(.igot.plt) > + } PHDR(text) > + DECL_SECTION(.iplt) { > + *(.iplt) > + } PHDR(text) > + DECL_SECTION(.plt) { > + *(.plt) > + } PHDR(text) > + DECL_SECTION(.rela) { > + *(.rela.*) > + } PHDR(text) > >>> > >>> Why do you need to explicitly place those in the text program header? > >> > >> I guess that's largely for consistency with all other directives. With the > >> assertions that these need to be empty, we might get away without, as long > >> as no linker would decide to set up another zero-size phdr for them. > > > > We already set the debug sections to not be part of any program header > > and seem to get away with it. I'm not sure how different the sections > > handled below would be, linkers might indeed want to place them > > regardless? > > Simply because I don't know I'd like to be on the safe side. Debug sections > can't really be taken as reference: At least GNU ld heavily special-cases > them anyway. > > > If so it might be good to add a comment that while those should be > > empty (and thus don't end up in any program header) we assign them to > > the text one in order to avoid the linker from creating a new program > > header for them. > > I'll add a sentence to the comment I'm already adding here. Thanks, Roger.
Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
On 08.03.2022 14:49, Roger Pau Monne wrote: > If livepatching support is enabled build the hypervisor with > -f{function,data}-sections compiler options, which is required by the > livepatching tools to detect changes and create livepatches. > > This shouldn't result in any functional change on the hypervisor > binary image, but does however require some changes in the linker > script in order to handle that each function and data item will now be > placed into its own section in object files. As a result add catch-all > for .text, .data and .bss in order to merge each individual item > section into the final image. > > The main difference will be that .text.startup will end up being part > of .text rather than .init, and thus won't be freed. .text.exit will > also be part of .text rather than dropped. Overall this could make the > image bigger, and package some .text code in a sub-optimal way. > > On Arm the .data.read_mostly needs to be moved ahead of the .data > section like it's already done on x86, so the .data.* catch-all > doesn't also include .data.read_mostly. The alignment of > .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end > up being placed at the tail of a read-only page from the previous > section. While there move the alignment of the .data section ahead of > the section declaration, like it's done for other sections. > > The benefit of having CONFIG_LIVEPATCH enable those compiler option > is that the livepatch build tools no longer need to fiddle with the > build system in order to enable them. Note the current livepatch tools > are broken after the recent build changes due to the way they > attempt to set -f{function,data}-sections. > > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -88,6 +88,9 @@ SECTIONS > *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > *(.text) > +#ifdef CONFIG_CC_SPLIT_SECTIONS > + *(.text.*) > +#endif > *(.text.__x86_indirect_thunk_*) > *(.text.page_aligned) These last two now will not have any effect anymore when CC_SPLIT_SECTIONS=y. This may have undesirable effects on the overall size when there is more than one object with a .text.page_aligned contribution. In .data ... > @@ -292,9 +295,7 @@ SECTIONS > >DECL_SECTION(.data) { > *(.data.page_aligned) > - *(.data) > - *(.data.rel) > - *(.data.rel.*) > + *(.data .data.*) >} PHDR(text) ... this continues to be named first. I wonder whether we wouldn't want to use SORT_BY_ALIGNMENT (if available) instead in both places. Jan
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On 08/03/2022 13:49, Roger Pau Monne wrote: > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S > index 08016948ab..47d09d6cf1 100644 > --- a/xen/arch/arm/xen.lds.S > +++ b/xen/arch/arm/xen.lds.S > @@ -30,9 +30,13 @@ SECTIONS >_start = .; >.text : { > _stext = .;/* Text section */ > + *(.text.header) > + > + *(.text.cold .text.cold.*) > + *(.text.unlikely .text.*_unlikely .text.unlikely.*) > + > *(.text) > - *(.text.cold) > - *(.text.unlikely) > + Most of this hunk wants to move into patch 2. Patch 1 should only introduce .text.header. Same for x86. Can fix on commit. ~Andrew
Re: [PATCH v2] x86/build: use --orphan-handling linker option if available
On 08.03.2022 15:07, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 01:34:06PM +0100, Jan Beulich wrote: >> On 08.03.2022 13:11, Roger Pau Monné wrote: >>> On Tue, Mar 08, 2022 at 12:15:04PM +0100, Jan Beulich wrote: On 08.03.2022 11:12, Roger Pau Monné wrote: > On Mon, Mar 07, 2022 at 02:53:32PM +0100, Jan Beulich wrote: >> @@ -179,6 +188,13 @@ SECTIONS >> #endif >> #endif >> >> +#ifndef EFI >> + /* Retain these just for the purpose of possible analysis tools. */ >> + DECL_SECTION(.note) { >> + *(.note.*) >> + } PHDR(note) PHDR(text) > > Wouldn't it be enough to place it in the note program header? > > The buildid note is already placed in .rodata, so any remaining notes > don't need to be in a LOAD section? All the notes will be covered by the NOTE phdr. I had this much later in the script originally, but then the NOTE phdr covered large parts of .init.*. Clearly that yields invalid notes, which analysis (or simple dumping) tools wouldn't be happy about. We might be able to add 2nd NOTE phdr, but mkelf32 assumes exactly 2 phdrs if it finds more than one, so changes there would likely be needed then (which I'd like to avoid for the moment). I'm also not sure in how far tools can be expected to look for multiple NOTE phdrs ... >>> >>> But if we are adding a .note section now we might as well merge it >>> with .note.gnu.build-id: >>> >>> DECL_SECTION(.note) { >>>__note_gnu_build_id_start = .; >>>*(.note.gnu.build-id) >>>__note_gnu_build_id_end = .; >>>*(.note.*) >>> } PHDR(note) PHDR(text) >>> >>> And drop the .note.Xen section? >> >> In an ideal world we likely could, yes. But do we know for sure that >> nothing recognizes the Xen notes by section name? > > Wouldn't that be wrong? In the elfnotes.h file it's clearly specified > that Xen notes live in a PT_NOTE program header and have 'Xen' in the > name field. There's no requirement of them being in any specific > section. True. But ELF also tells us to not go from section names (only), but to consider type and attribute as well. Yet what do most tools do? >> .note.gnu.build-id >> cannot be folded in any event - see the rule for generating note.o, >> to be used by xen.efi linking in certain cases. > > Right, so we need to keep the .note.gnu.build-id section, but we could > likely fold .note.Xen into .note I think? > > Or at least add a comment to mention that we don't want to fold > .note.Xen into .note in case there are tools that search for specific > Xen notes to be contained in .note.Xen. I can add such a comment, sure. Jan
Re: [XEN PATCH v9 06/30] build: rework test/livepatch/Makefile
> From: Anthony PERARD > Sent: Tuesday, January 25, 2022 11:00 AM > To: xen-devel@lists.xenproject.org > Cc: Anthony Perard ; Jan Beulich > ; Andrew Cooper ; George Dunlap > ; Julien Grall ; Stefano Stabellini > ; Wei Liu ; Konrad Rzeszutek Wilk > ; Ross Lagerwall > Subject: [XEN PATCH v9 06/30] build: rework test/livepatch/Makefile > > This rework the livepatch/Makefile to make it less repetitive and make > use of the facilities. All the targets to be built are now listed in > $(extra-y) which will allow Rules.mk to build them without the need of > a local target in a future patch. > > There are some changes/fixes in this patch: > - when "xen-syms" is used for a target, it is added to the dependency > list of the target, which allow to rebuild the target when xen-syms > changes. But if "xen-syms" is missing, make simply fails. > - modinfo.o wasn't removing it's $@.bin file like the other targets, > this is now done. > - The command to build *.livepatch targets as been fixed to use > $(XEN_LDFLAGS) rather than just $(LDFLAGS) which is a fallout from > 2740d96efdd3 ("xen/build: have the root Makefile generates the > CFLAGS") > > make will findout the dependencies of the *.livepatch files and thus > what to built by "looking" at the objects listed in the *-objs > variables. The actual dependencies is generated by the new > "multi-depend" macro. > > "$(targets)" needs to be updated with the objects listed in the > different *-objs variables to allow make to load the .*.cmd dependency > files. > > This patch copies the macro "multi_depend" from Linux 5.12, and rename > it to "multi-depend". > > Signed-off-by: Anthony PERARD > Acked-by: Jan Beulich > --- > Acked-by: Ross Lagerwall
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote: > On 08.03.2022 14:49, Roger Pau Monne wrote: > > So it can be explicitly placed ahead of the rest of the .text content > > in the linker script (and thus the resulting image). This is a > > prerequisite for further work that will add a catch-all to the text > > section (.text.*). > > > > Note that placement of the sections inside of .text is also slightly > > adjusted to be more similar to the position found in the default GNU > > ld linker script. > > > > The special handling of the object file containing the header data as > > the first object file passed to the linker command line can also be > > removed. > > > > While there also remove the special handling of efi/ on x86. There's > > no need for the resulting object file to be passed in any special > > order to the linker. > > > > Signed-off-by: Roger Pau Monné > > Looks good to me, but I have one question before feeling ready to > offer R-b: > > > @@ -86,8 +84,13 @@ SECTIONS > > *(.text.kexec) /* Page aligned in the object file. */ > > kexec_reloc_end = .; > > > > - *(.text.cold) > > - *(.text.unlikely) > > + *(.text.cold .text.cold.*) > > + *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > What generates .text.*_unlikely? And if anything really does, why > would .text.cold not have a similar equivalent? That matches what I saw in the default linker script from my version of GNU ld: *(.text.unlikely .text.*_unlikely .text.unlikely.*) I really don't know what could generate .text.*_unlikely, but since it's part of the default linker script I assumed it was better to just add it. Thanks, Roger.
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On Tue, Mar 08, 2022 at 02:11:28PM +, Andrew Cooper wrote: > On 08/03/2022 13:49, Roger Pau Monne wrote: > > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S > > index 08016948ab..47d09d6cf1 100644 > > --- a/xen/arch/arm/xen.lds.S > > +++ b/xen/arch/arm/xen.lds.S > > @@ -30,9 +30,13 @@ SECTIONS > >_start = .; > >.text : { > > _stext = .;/* Text section */ > > + *(.text.header) > > + > > + *(.text.cold .text.cold.*) > > + *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > + > > *(.text) > > - *(.text.cold) > > - *(.text.unlikely) > > + > > Most of this hunk wants to move into patch 2. Patch 1 should only > introduce .text.header. > > Same for x86. Can fix on commit. The justification for doing it here is to better match the ordering used by the default linker script from GNU ld. Obviously this also benefits the following patch. Feel free to move, but I think it's also fine to do it here. Thanks, Roger.
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08/03/2022 12:33, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote: >> On 08.03.2022 12:38, Roger Pau Monné wrote: >>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) } } -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED ) +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | + XEN_X86_ASSISTED_XAPIC | + XEN_X86_ASSISTED_X2APIC) ) { dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n", config->arch.misc_flags); return -EINVAL; } +if ( (assisted_xapic || assisted_x2apic) && !hvm ) +{ +dprintk(XENLOG_INFO, +"Interrupt Controller Virtualization not supported for PV\n"); +return -EINVAL; +} + +if ( (assisted_xapic && !assisted_xapic_available) || + (assisted_x2apic && !assisted_x2apic_available) ) +{ +dprintk(XENLOG_INFO, +"Hardware assisted x%sAPIC requested but not available\n", +assisted_xapic && !assisted_xapic_available ? "" : "2"); +return -EINVAL; >>> >>> I think for those two you could return -ENODEV if others agree. >> >> If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this >> and the earlier if()), then I agree. I'm always in favor of using distinct >> error codes when possible and at least halfway sensible. > > I would be fine by using it for the !hvm if also. IMO it makes sense > as PV doesn't have an APIC 'device' at all, so ENODEV would seem > fitting. EINVAL is also fine as the caller shouldn't even attempt that > in the first place. > > So let's use it for the last if only. Wouldn't it make more sense to use -ENODEV particularly for the first? I agree that -ENODEV should be reported in the first case because it doesn't make sense to request acceleration of something that doesn't exist and I should have put that. But having a look at the hap code (since it resembles the second case), it returns -EINVAL when it is not available, unless you deem this to be different or, in retrospective, that the hap code should too have been coded to return -ENODEV. if ( hap && !hvm_hap_supported() ) { dprintk(XENLOG_INFO, "HAP requested but not available\n"); return -EINVAL; } I agree with all the other comments and have made the approp changes for v6. Thanks a lot. Jane.
Re: [PATCH 1/2] Livepatch: resolve old address before function verification
> From: Bjoern Doebel > Sent: Monday, March 7, 2022 11:53 AM > To: xen-devel@lists.xenproject.org > Cc: Michael Kurth ; Martin Pohlack ; > Roger Pau Monne ; Andrew Cooper > ; Bjoern Doebel ; Konrad > Rzeszutek Wilk ; Ross Lagerwall > > Subject: [PATCH 1/2] Livepatch: resolve old address before function > verification > > When verifying that a livepatch can be applied, we may as well want to > inspect the target function to be patched. To do so, we need to resolve > this function's address before running the arch-specific > livepatch_verify hook. > > Signed-off-by: Bjoern Doebel > CC: Konrad Rzeszutek Wilk > CC: Ross Lagerwall > --- > xen/common/livepatch.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c > index ec301a9f12..be2cf75c2d 100644 > --- a/xen/common/livepatch.c > +++ b/xen/common/livepatch.c > @@ -684,11 +684,11 @@ static int prepare_payload(struct payload *payload, > return -EINVAL; > } > > -rc = arch_livepatch_verify_func(f); > +rc = resolve_old_address(f, elf); > if ( rc ) > return rc; > > -rc = resolve_old_address(f, elf); > +rc = arch_livepatch_verify_func(f); > if ( rc ) > return rc; > > -- > 2.32.0 Reviewed-by: Ross Lagerwall
Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
On 08.03.2022 15:01, Andrew Cooper wrote: > For livepatching, we need to look at a potentially clobbered function and > determine whether it used to have an ENDBR64 instruction. > > Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce > the was_endbr64() predicate. Did you consider using ENDBR32 for this purpose? I'm worried that the pattern you picked is still too close to what might be used (down the road) by a tool chain. If ENDBR32 isn't suitable for some reason, how about "nop %sp" or "nopw (%rsp)" (and then maybe even "data16" substituted by rex, cs, ds, es, or ss)? One neat thing about ENDBR32 would be that you wouldn't even need memcpy() - you'd merely flip the low main opcode bit. > Bjoern: For the livepatching code, I think you want: > > if ( is_endbr64(...) || was_endbr64(...) ) > needed += ENDBR64_LEN; Looks like I didn't fully understand the problem then from your initial description. The adjustment here (and the one needed in Björn's patch) is to compensate for the advancing of the targets of altcalls past the ENDBR? > --- a/xen/arch/x86/include/asm/endbr.h > +++ b/xen/arch/x86/include/asm/endbr.h > @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr) > *(uint32_t *)ptr = gen_endbr64(); > } > > +/* > + * After clobbering ENDBR64, we may need to confirm that the site used to > + * contain an ENDBR64 instruction. Use an encoding which isn't the default > + * P6_NOP4. > + */ > +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */ In case this remains as is - did you mean "opsz" instead of "osp"? But this really is "nopw (%rax)" anyway. Jan
Re: [PATCH 4/4] livepatch: differentiate between old and new build systems
On 02/03/2022 14:27, Roger Pau Monne wrote: > diff --git a/livepatch-build b/livepatch-build > index 38a92be..656cdac 100755 > --- a/livepatch-build > +++ b/livepatch-build > @@ -98,14 +98,20 @@ function build_special() > > # Build with special GCC flags > cd "${SRCDIR}/xen" || die > -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections > -fdata-sections/' Rules.mk > -cp -p arch/x86/Makefile arch/x86/Makefile.bak > -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' > arch/x86/Makefile > -# Restore timestamps to prevent spurious rebuilding > -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || die > -sed -i 's/CFLAGS += -nostdinc -ffunction-sections -fdata-sections/CFLAGS > += -nostdinc/' Rules.mk > -mv -f arch/x86/Makefile.bak arch/x86/Makefile > +if grep -q 'nostdinc' Rules.mk; then > + # Support for old build system, attempt to set > -f{function,data}-sections and rebuild > +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc > -ffunction-sections -fdata-sections/' Rules.mk > +cp -p arch/x86/Makefile arch/x86/Makefile.bak > +sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' > arch/x86/Makefile > +# Restore timestamps to prevent spurious rebuilding > +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > || die > +sed -i 's/CFLAGS += -nostdinc -ffunction-sections > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk > +mv -f arch/x86/Makefile.bak arch/x86/Makefile > +else > +# -f{function,data}-sections set by CONFIG_LIVEPATCH > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > || die > +fi This really ought to be the other way around, by spotting the thing we know is good, and then falling back to the heuristics. In light of the updates to the Xen side, something like: if grep -q CC_SPLIT_SECTIONS Kconfig; then # -f{function,data}-sections set by CONFIG_LIVEPATCH make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || die else # Support for old build system, attempt to set -f{function,data}-sections and rebuild ... fi ? ~Andrew
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08.03.2022 15:31, Jane Malalane wrote: > On 08/03/2022 12:33, Roger Pau Monné wrote: >> On Tue, Mar 08, 2022 at 01:24:23PM +0100, Jan Beulich wrote: >>> On 08.03.2022 12:38, Roger Pau Monné wrote: On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: > @@ -685,13 +687,31 @@ int arch_sanitise_domain_config(struct > xen_domctl_createdomain *config) > } > } > > -if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED ) > +if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | > + XEN_X86_ASSISTED_XAPIC | > + XEN_X86_ASSISTED_X2APIC) ) > { > dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n", > config->arch.misc_flags); > return -EINVAL; > } > > +if ( (assisted_xapic || assisted_x2apic) && !hvm ) > +{ > +dprintk(XENLOG_INFO, > +"Interrupt Controller Virtualization not supported for > PV\n"); > +return -EINVAL; > +} > + > +if ( (assisted_xapic && !assisted_xapic_available) || > + (assisted_x2apic && !assisted_x2apic_available) ) > +{ > +dprintk(XENLOG_INFO, > +"Hardware assisted x%sAPIC requested but not > available\n", > +assisted_xapic && !assisted_xapic_available ? "" : "2"); > +return -EINVAL; I think for those two you could return -ENODEV if others agree. >>> >>> If by "two" you mean the xAPIC and x2APIC aspects here (and not e.g. this >>> and the earlier if()), then I agree. I'm always in favor of using distinct >>> error codes when possible and at least halfway sensible. >> >> I would be fine by using it for the !hvm if also. IMO it makes sense >> as PV doesn't have an APIC 'device' at all, so ENODEV would seem >> fitting. EINVAL is also fine as the caller shouldn't even attempt that >> in the first place. >> >> So let's use it for the last if only. > Wouldn't it make more sense to use -ENODEV particularly for the first? I > agree that -ENODEV should be reported in the first case because it > doesn't make sense to request acceleration of something that doesn't > exist and I should have put that. But having a look at the hap code > (since it resembles the second case), it returns -EINVAL when it is not > available, unless you deem this to be different or, in retrospective, > that the hap code should too have been coded to return -ENODEV. > > if ( hap && !hvm_hap_supported() ) > { > dprintk(XENLOG_INFO, "HAP requested but not available\n"); > return -EINVAL; > } This is just one of the examples where using -ENODEV as you suggest would introduce an inconsistency. We use -EINVAL also for other purely guest-type dependent checks. Jan
Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote: > On 08.03.2022 14:49, Roger Pau Monne wrote: > > If livepatching support is enabled build the hypervisor with > > -f{function,data}-sections compiler options, which is required by the > > livepatching tools to detect changes and create livepatches. > > > > This shouldn't result in any functional change on the hypervisor > > binary image, but does however require some changes in the linker > > script in order to handle that each function and data item will now be > > placed into its own section in object files. As a result add catch-all > > for .text, .data and .bss in order to merge each individual item > > section into the final image. > > > > The main difference will be that .text.startup will end up being part > > of .text rather than .init, and thus won't be freed. .text.exit will > > also be part of .text rather than dropped. Overall this could make the > > image bigger, and package some .text code in a sub-optimal way. > > > > On Arm the .data.read_mostly needs to be moved ahead of the .data > > section like it's already done on x86, so the .data.* catch-all > > doesn't also include .data.read_mostly. The alignment of > > .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end > > up being placed at the tail of a read-only page from the previous > > section. While there move the alignment of the .data section ahead of > > the section declaration, like it's done for other sections. > > > > The benefit of having CONFIG_LIVEPATCH enable those compiler option > > is that the livepatch build tools no longer need to fiddle with the > > build system in order to enable them. Note the current livepatch tools > > are broken after the recent build changes due to the way they > > attempt to set -f{function,data}-sections. > > > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Jan Beulich > > > --- a/xen/arch/x86/xen.lds.S > > +++ b/xen/arch/x86/xen.lds.S > > @@ -88,6 +88,9 @@ SECTIONS > > *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > > > *(.text) > > +#ifdef CONFIG_CC_SPLIT_SECTIONS > > + *(.text.*) > > +#endif > > *(.text.__x86_indirect_thunk_*) > > *(.text.page_aligned) > > These last two now will not have any effect anymore when > CC_SPLIT_SECTIONS=y. This may have undesirable effects on the > overall size when there is more than one object with a > .text.page_aligned contribution. In .data ... Agreed. I wondered whether to move those ahead of the main text section, so likely: *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.page_aligned) *(.text.__x86_indirect_thunk_*) *(.text) #ifdef CONFIG_CC_SPLIT_SECTIONS *(.text.*) #endif FWIW, Linux seems fine to package .text.page_aligned together with the rest of .text using the .text.[0-9a-zA-Z_]* catch-all. > > @@ -292,9 +295,7 @@ SECTIONS > > > >DECL_SECTION(.data) { > > *(.data.page_aligned) > > - *(.data) > > - *(.data.rel) > > - *(.data.rel.*) > > + *(.data .data.*) > >} PHDR(text) > > ... this continues to be named first. I wonder whether we wouldn't > want to use SORT_BY_ALIGNMENT (if available) instead in both places. We could use the command line option if available (--sort-section=alignment) to sort all wildcard sections? Thanks, Roger.
Re: [PATCH 4/4] livepatch: differentiate between old and new build systems
On Tue, Mar 08, 2022 at 02:38:47PM +, Andrew Cooper wrote: > On 02/03/2022 14:27, Roger Pau Monne wrote: > > diff --git a/livepatch-build b/livepatch-build > > index 38a92be..656cdac 100755 > > --- a/livepatch-build > > +++ b/livepatch-build > > @@ -98,14 +98,20 @@ function build_special() > > > > # Build with special GCC flags > > cd "${SRCDIR}/xen" || die > > -sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc -ffunction-sections > > -fdata-sections/' Rules.mk > > -cp -p arch/x86/Makefile arch/x86/Makefile.bak > > -sed -i 's/--section-alignment=0x20/--section-alignment=0x1000/' > > arch/x86/Makefile > > -# Restore timestamps to prevent spurious rebuilding > > -touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > > -make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" || > > die > > -sed -i 's/CFLAGS += -nostdinc -ffunction-sections > > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk > > -mv -f arch/x86/Makefile.bak arch/x86/Makefile > > +if grep -q 'nostdinc' Rules.mk; then > > + # Support for old build system, attempt to set > > -f{function,data}-sections and rebuild > > +sed -i 's/CFLAGS += -nostdinc/CFLAGS += -nostdinc > > -ffunction-sections -fdata-sections/' Rules.mk > > +cp -p arch/x86/Makefile arch/x86/Makefile.bak > > +sed -i > > 's/--section-alignment=0x20/--section-alignment=0x1000/' > > arch/x86/Makefile > > +# Restore timestamps to prevent spurious rebuilding > > +touch --reference=arch/x86/Makefile.bak arch/x86/Makefile > > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > > || die > > +sed -i 's/CFLAGS += -nostdinc -ffunction-sections > > -fdata-sections/CFLAGS += -nostdinc/' Rules.mk > > +mv -f arch/x86/Makefile.bak arch/x86/Makefile > > +else > > +# -f{function,data}-sections set by CONFIG_LIVEPATCH > > +make "-j$CPUS" $XEN_DEBUG &> "${OUTPUT}/build_${name}_compile.log" > > || die > > +fi > > This really ought to be the other way around, by spotting the thing we > know is good, and then falling back to the heuristics. In light of the > updates to the Xen side, something like: I'm not sure I agree. I do prefer to spot the 'bad' one, and just fallback to expecting Xen to correctly set -f{function,data}-sections otherwise. > if grep -q CC_SPLIT_SECTIONS Kconfig; then Because this logic ties us to not moving CC_SPLIT_SECTIONS from being defined in xen/Kconfig (or even changing it's name), and gain ties the livepatch tools to internal details about the Xen build system. Thanks, Roger.
Re: [PATCH v3 0/2] livepatch: enable -f{function,data}-sections compiler option
Hi, On 08/03/2022 13:49, Roger Pau Monne wrote: Hello, The content in v3 has been split in two patches, but is still mostly the same. The main difference is that first patch does a bit of cleanup of the build logic now that the header object file doesn't need to be the first one passed to the linker script. Thanks, Roger. Roger Pau Monne (2): xen/build: put image header into a separate section livepatch: set -f{function,data}-sections compiler option For the Arm bits: Acked-by: Julien Grall # xen/arm Cheers, xen/Kconfig | 4 +++ xen/Makefile| 2 ++ xen/arch/arm/arch.mk| 2 -- xen/arch/arm/arm32/Makefile | 3 +-- xen/arch/arm/arm32/head.S | 1 + xen/arch/arm/arm64/Makefile | 3 +-- xen/arch/arm/arm64/head.S | 1 + xen/arch/arm/xen.lds.S | 49 - xen/arch/x86/Makefile | 5 ++-- xen/arch/x86/arch.mk| 2 -- xen/arch/x86/boot/head.S| 2 +- xen/arch/x86/xen.lds.S | 22 ++--- xen/common/Kconfig | 1 + 13 files changed, 54 insertions(+), 43 deletions(-) -- Julien Grall
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On 08.03.2022 15:18, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote: >> On 08.03.2022 14:49, Roger Pau Monne wrote: >>> So it can be explicitly placed ahead of the rest of the .text content >>> in the linker script (and thus the resulting image). This is a >>> prerequisite for further work that will add a catch-all to the text >>> section (.text.*). >>> >>> Note that placement of the sections inside of .text is also slightly >>> adjusted to be more similar to the position found in the default GNU >>> ld linker script. >>> >>> The special handling of the object file containing the header data as >>> the first object file passed to the linker command line can also be >>> removed. >>> >>> While there also remove the special handling of efi/ on x86. There's >>> no need for the resulting object file to be passed in any special >>> order to the linker. >>> >>> Signed-off-by: Roger Pau Monné >> >> Looks good to me, but I have one question before feeling ready to >> offer R-b: >> >>> @@ -86,8 +84,13 @@ SECTIONS >>> *(.text.kexec) /* Page aligned in the object file. */ >>> kexec_reloc_end = .; >>> >>> - *(.text.cold) >>> - *(.text.unlikely) >>> + *(.text.cold .text.cold.*) >>> + *(.text.unlikely .text.*_unlikely .text.unlikely.*) >> >> What generates .text.*_unlikely? And if anything really does, why >> would .text.cold not have a similar equivalent? > > That matches what I saw in the default linker script from my version > of GNU ld: > > *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > I really don't know what could generate .text.*_unlikely, but since > it's part of the default linker script I assumed it was better to just > add it. I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from 4.6.x. onwards it would be .text.unlikely.*. As to the dissimilarity with .text.cold: I wonder why we have that in the first place. It matches our __cold attribute, just that we don't use that anywhere afaics. In any event: Reviewed-by: Jan Beulich albeit preferably with .text.cold.* dropped again. Jan
Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
On 08.03.2022 15:46, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote: >> On 08.03.2022 14:49, Roger Pau Monne wrote: >>> If livepatching support is enabled build the hypervisor with >>> -f{function,data}-sections compiler options, which is required by the >>> livepatching tools to detect changes and create livepatches. >>> >>> This shouldn't result in any functional change on the hypervisor >>> binary image, but does however require some changes in the linker >>> script in order to handle that each function and data item will now be >>> placed into its own section in object files. As a result add catch-all >>> for .text, .data and .bss in order to merge each individual item >>> section into the final image. >>> >>> The main difference will be that .text.startup will end up being part >>> of .text rather than .init, and thus won't be freed. .text.exit will >>> also be part of .text rather than dropped. Overall this could make the >>> image bigger, and package some .text code in a sub-optimal way. >>> >>> On Arm the .data.read_mostly needs to be moved ahead of the .data >>> section like it's already done on x86, so the .data.* catch-all >>> doesn't also include .data.read_mostly. The alignment of >>> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end >>> up being placed at the tail of a read-only page from the previous >>> section. While there move the alignment of the .data section ahead of >>> the section declaration, like it's done for other sections. >>> >>> The benefit of having CONFIG_LIVEPATCH enable those compiler option >>> is that the livepatch build tools no longer need to fiddle with the >>> build system in order to enable them. Note the current livepatch tools >>> are broken after the recent build changes due to the way they >>> attempt to set -f{function,data}-sections. >>> >>> Signed-off-by: Roger Pau Monné >> >> Reviewed-by: Jan Beulich >> >>> --- a/xen/arch/x86/xen.lds.S >>> +++ b/xen/arch/x86/xen.lds.S >>> @@ -88,6 +88,9 @@ SECTIONS >>> *(.text.unlikely .text.*_unlikely .text.unlikely.*) >>> >>> *(.text) >>> +#ifdef CONFIG_CC_SPLIT_SECTIONS >>> + *(.text.*) >>> +#endif >>> *(.text.__x86_indirect_thunk_*) >>> *(.text.page_aligned) >> >> These last two now will not have any effect anymore when >> CC_SPLIT_SECTIONS=y. This may have undesirable effects on the >> overall size when there is more than one object with a >> .text.page_aligned contribution. In .data ... > > Agreed. I wondered whether to move those ahead of the main text > section, so likely: > >*(.text.unlikely .text.*_unlikely .text.unlikely.*) > >*(.text.page_aligned) >*(.text.__x86_indirect_thunk_*) >*(.text) > #ifdef CONFIG_CC_SPLIT_SECTIONS >*(.text.*) > #endif Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*, though. When adding .text.* that one can likely go away. > FWIW, Linux seems fine to package .text.page_aligned together with the > rest of .text using the .text.[0-9a-zA-Z_]* catch-all. There's no question this is functionally fine. The question is how many extra padding areas are inserted because of this. >>> @@ -292,9 +295,7 @@ SECTIONS >>> >>>DECL_SECTION(.data) { >>> *(.data.page_aligned) >>> - *(.data) >>> - *(.data.rel) >>> - *(.data.rel.*) >>> + *(.data .data.*) >>>} PHDR(text) >> >> ... this continues to be named first. I wonder whether we wouldn't >> want to use SORT_BY_ALIGNMENT (if available) instead in both places. > > We could use the command line option if available > (--sort-section=alignment) to sort all wildcard sections? Depends on the scope of the sorting that would result when enabled globally like this. Jan
Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
On 08/03/2022 14:37, Jan Beulich wrote: > On 08.03.2022 15:01, Andrew Cooper wrote: >> For livepatching, we need to look at a potentially clobbered function and >> determine whether it used to have an ENDBR64 instruction. >> >> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and >> introduce >> the was_endbr64() predicate. > Did you consider using ENDBR32 for this purpose? No, and no because that's very short sighted. Even 4 non-nops would be better than ENDBR32, because they wouldn't create actually-usable codepaths in corner cases we care to exclude. > I'm worried that > the pattern you picked is still too close to what might be used > (down the road) by a tool chain. This is what Linux are doing too, but no - I'm not worried. The encoding isn't the only protection; toolchains also have no reason to put a nop4 at the head of functions; nop5 is the common one to find. > One neat thing about ENDBR32 would be that you wouldn't even > need memcpy() - you'd merely flip the low main opcode bit. Not relevant. You're taking the SMC pipeline hit for any sized write, and a single movl is far less cryptic. >> Bjoern: For the livepatching code, I think you want: >> >> if ( is_endbr64(...) || was_endbr64(...) ) >> needed += ENDBR64_LEN; > Looks like I didn't fully understand the problem then from your > initial description. The adjustment here (and the one needed in > Björn's patch) is to compensate for the advancing of the > targets of altcalls past the ENDBR? No. Consider this scenario: callee: endbr64 ... altcall: call *foo(%rip) During boot, we rewrite altcall to be `call callee+4` and turn endbr64 into nops, so it now looks like: callee: nop4 ... altcall: call callee+4 Then we want to livepatch callee to callee_new, so we get callee_new: endbr64 ... in the livepatch itself. Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5). The livepatch logic calling is_endbr(callee) doesn't work because it's now a nop4, which is why it needs a was_endbr64(callee) too. > >> --- a/xen/arch/x86/include/asm/endbr.h >> +++ b/xen/arch/x86/include/asm/endbr.h >> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr) >> *(uint32_t *)ptr = gen_endbr64(); >> } >> >> +/* >> + * After clobbering ENDBR64, we may need to confirm that the site used to >> + * contain an ENDBR64 instruction. Use an encoding which isn't the default >> + * P6_NOP4. >> + */ >> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */ > In case this remains as is - did you mean "opsz" instead of "osp"? > But this really is "nopw (%rax)" anyway. Oh, osp is the nasm name. I'll switch to nopw. ~Andrew
[PATCH v4 1/2] Livepatch: resolve old address before function verification
When verifying that a livepatch can be applied, we may as well want to inspect the target function to be patched. To do so, we need to resolve this function's address before running the arch-specific livepatch_verify hook. Signed-off-by: Bjoern Doebel Acked-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Lagerwall --- xen/common/livepatch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c index ec301a9f12..be2cf75c2d 100644 --- a/xen/common/livepatch.c +++ b/xen/common/livepatch.c @@ -684,11 +684,11 @@ static int prepare_payload(struct payload *payload, return -EINVAL; } -rc = arch_livepatch_verify_func(f); +rc = resolve_old_address(f, elf); if ( rc ) return rc; -rc = resolve_old_address(f, elf); +rc = arch_livepatch_verify_func(f); if ( rc ) return rc; -- 2.32.0 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v4 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
Xen enabled CET for supporting architectures. The control flow aspect of CET expects functions that can be called indirectly (i.e., via function pointers) to start with an ENDBR64 instruction. Otherwise a control flow exception is raised. This expectation breaks livepatching flows because we patch functions by overwriting their first 5 bytes with a JMP + , thus breaking the ENDBR64. We fix this by checking the start of a patched function for being ENDBR64. In the positive case we move the livepatch JMP to start behind the ENDBR64 instruction. To avoid having to guess the ENDBR64 offset again on patch reversal (which might race with other mechanisms adding/removing ENDBR dynamically), use the livepatch metadata to store the computed offset along with the saved bytes of the overwritten function. Signed-off-by: Bjoern Doebel Acked-by: Konrad Rzeszutek Wilk CC: Ross Lagerwall Note that on top of livepatching functions, Xen supports an additional mode where we can "remove" a function by overwriting it with NOPs. This is only supported for functions up to 31 bytes in size and this patch reduces this limit to 30 bytes. Changes since r1: * use sizeof_field() to avoid unused variable warning * make metadata variable const in arch_livepatch_revert * rebase on top and make use of Andrew Cooper's was_endbr64() patch --- xen/arch/x86/livepatch.c | 61 ++-- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 65530c1e57..5380e18bd9 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -14,11 +14,29 @@ #include #include +#include #include #include #include #include +/* + * CET hotpatching support: We may have functions starting with an ENDBR64 + * instruction that MUST remain the first instruction of the function, hence + * we need to move any hotpatch trampoline further into the function. For that + * we need to keep track of the patching offset used for any loaded hotpatch + * (to avoid racing against other fixups adding/removing ENDBR64 or similar + * instructions). + * + * We do so by making use of the existing opaque metadata area. We use its + * first 4 bytes to track the offset into the function used for patching and + * the remainder of the data to store overwritten code bytes. + */ +struct x86_livepatch_meta { +uint8_t patch_offset; +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; +}; + static bool has_active_waitqueue(const struct vm_event_domain *ved) { /* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */ @@ -104,18 +122,34 @@ void noinline arch_livepatch_revive(void) int arch_livepatch_verify_func(const struct livepatch_func *func) { +BUILD_BUG_ON(sizeof(struct x86_livepatch_meta) != LIVEPATCH_OPAQUE_SIZE); + /* If NOPing.. */ if ( !func->new_addr ) { /* Only do up to maximum amount we can put in the ->opaque. */ -if ( func->new_size > sizeof(func->opaque) ) +if ( func->new_size > sizeof_field(struct x86_livepatch_meta, + instruction) ) return -EOPNOTSUPP; if ( func->old_size < func->new_size ) return -EINVAL; } -else if ( func->old_size < ARCH_PATCH_INSN_SIZE ) -return -EINVAL; +else +{ +/* + * Space needed now depends on whether the target function + * starts with an ENDBR64 instruction. + */ +uint8_t needed; + +needed = ARCH_PATCH_INSN_SIZE; +if ( is_endbr64(func->old_addr) || was_endbr64(func->old_addr) ) +needed += ENDBR64_LEN; + +if ( func->old_size < needed ) +return -EINVAL; +} return 0; } @@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct livepatch_func *func) void noinline arch_livepatch_apply(struct livepatch_func *func) { uint8_t *old_ptr; -uint8_t insn[sizeof(func->opaque)]; +struct x86_livepatch_meta *lp; +uint8_t insn[sizeof(lp->instruction)]; unsigned int len; +lp = (struct x86_livepatch_meta *)func->opaque; +lp->patch_offset = 0; old_ptr = func->old_addr; len = livepatch_insn_len(func); if ( !len ) return; -memcpy(func->opaque, old_ptr, len); +if ( is_endbr64(old_ptr) ) +lp->patch_offset += ENDBR64_LEN; + +memcpy(lp->instruction, old_ptr + lp->patch_offset, len); if ( func->new_addr ) { int32_t val; @@ -143,14 +183,15 @@ void noinline arch_livepatch_apply(struct livepatch_func *func) BUILD_BUG_ON(ARCH_PATCH_INSN_SIZE != (1 + sizeof(val))); insn[0] = 0xe9; /* Relative jump. */ -val = func->new_addr - func->old_addr - ARCH_PATCH_INSN_SIZE; +val = func->new_addr - (func->old_addr + lp->patch_offset ++ ARCH_PATCH_INSN_SIZE); memcpy(&insn[1], &val, sizeof(val)); } e
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
> From: Bjoern Doebel > Sent: Tuesday, March 8, 2022 10:29 AM > To: xen-devel@lists.xenproject.org > Cc: Michael Kurth ; Martin Pohlack ; > Roger Pau Monne ; Andrew Cooper > ; Bjoern Doebel ; Konrad > Rzeszutek Wilk ; Ross Lagerwall > > Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced > functions > > Xen enabled CET for supporting architectures. The control flow aspect of > CET expects functions that can be called indirectly (i.e., via function > pointers) to start with an ENDBR64 instruction. Otherwise a control flow > exception is raised. > > This expectation breaks livepatching flows because we patch functions by > overwriting their first 5 bytes with a JMP + , thus breaking the > ENDBR64. We fix this by checking the start of a patched function for > being ENDBR64. In the positive case we move the livepatch JMP to start > behind the ENDBR64 instruction. > > To avoid having to guess the ENDBR64 offset again on patch reversal > (which might race with other mechanisms adding/removing ENDBR > dynamically), use the livepatch metadata to store the computed offset > along with the saved bytes of the overwritten function. > > Signed-off-by: Bjoern Doebel > CC: Konrad Rzeszutek Wilk > CC: Ross Lagerwall > > Note that on top of livepatching functions, Xen supports an additional > mode where we can "remove" a function by overwriting it with NOPs. This > is only supported for functions up to 31 bytes in size and this patch > reduces this limit to 30 bytes. > > Changes since r1: > * use sizeof_field() to avoid unused variable warning > * make metadata variable const in arch_livepatch_revert > --- > xen/arch/x86/livepatch.c | 61 ++-- > 1 file changed, 53 insertions(+), 8 deletions(-) > > diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c > index 65530c1e57..0fd97f2a00 100644 > --- a/xen/arch/x86/livepatch.c > +++ b/xen/arch/x86/livepatch.c > @@ -14,11 +14,29 @@ > #include > #include > > +#include > #include > #include > #include > #include > > +/* > + * CET hotpatching support: We may have functions starting with an ENDBR64 > + * instruction that MUST remain the first instruction of the function, hence > + * we need to move any hotpatch trampoline further into the function. For > that > + * we need to keep track of the patching offset used for any loaded hotpatch > + * (to avoid racing against other fixups adding/removing ENDBR64 or similar > + * instructions). > + * > + * We do so by making use of the existing opaque metadata area. We use its > + * first 4 bytes to track the offset into the function used for patching and > + * the remainder of the data to store overwritten code bytes. > + */ > +struct x86_livepatch_meta { > +uint8_t patch_offset; > +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; > +}; > + I think it would make things a bit simpler to use one of the spare _pad bits from struct livepatch_func ather than hacking it into the opaque area. Is there a reason you chose this approach?
[PATCH 4.16] VT-d: drop undue address-of from check_cleanup_domid_map()
For an unknown reason I added back the operator while backporting, despite 4.16 having c06e3d810314 ("VT-d: per-domain IOMMU bitmap needs to have dynamic size"). I can only assume that I mistakenly took the 4.15 backport as basis and/or reference. Fixes: fa45f6b5560e ("VT-d: split domid map cleanup check into a function") Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -197,7 +197,7 @@ static void check_cleanup_domid_map(stru if ( !found ) { -clear_bit(iommu->index, &dom_iommu(d)->arch.vtd.iommu_bitmap); +clear_bit(iommu->index, dom_iommu(d)->arch.vtd.iommu_bitmap); cleanup_domid_map(d, iommu); } }
Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
On 08.03.2022 16:19, Andrew Cooper wrote: > On 08/03/2022 14:37, Jan Beulich wrote: >> On 08.03.2022 15:01, Andrew Cooper wrote: >>> For livepatching, we need to look at a potentially clobbered function and >>> determine whether it used to have an ENDBR64 instruction. >>> >>> Use a non-default 4-byte P6 long nop, not emitted by toolchains, and >>> introduce >>> the was_endbr64() predicate. >> Did you consider using ENDBR32 for this purpose? > > No, and no because that's very short sighted. Even 4 non-nops would be > better than ENDBR32, because they wouldn't create actually-usable > codepaths in corner cases we care to exclude. Well - I thought of ENDBR32 because elsewhere you said we don't need to be worried about any byte sequence resembling that insn, since for it to become "usable" an attacker would first need to have managed to manufacture a 32-bit ring0 code segment. Now you say effectively the opposite. >> I'm worried that >> the pattern you picked is still too close to what might be used >> (down the road) by a tool chain. > > This is what Linux are doing too, but no - I'm not worried. The > encoding isn't the only protection; toolchains also have no reason to > put a nop4 at the head of functions; nop5 is the common one to find. Well, okay - let's hope for the best then. >>> Bjoern: For the livepatching code, I think you want: >>> >>> if ( is_endbr64(...) || was_endbr64(...) ) >>> needed += ENDBR64_LEN; >> Looks like I didn't fully understand the problem then from your >> initial description. The adjustment here (and the one needed in >> Björn's patch) is to compensate for the advancing of the >> targets of altcalls past the ENDBR? > > No. Consider this scenario: > > callee: > endbr64 > ... > > altcall: > call *foo(%rip) > > During boot, we rewrite altcall to be `call callee+4` and turn endbr64 > into nops, so it now looks like: > > callee: > nop4 > ... > > altcall: > call callee+4 > > Then we want to livepatch callee to callee_new, so we get > > callee_new: > endbr64 > ... > > in the livepatch itself. > > Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5). > > The livepatch logic calling is_endbr(callee) doesn't work because it's > now a nop4, which is why it needs a was_endbr64(callee) too. Sounds like exactly what I was thinking of. Perhaps my description wasn't sufficiently clear / unambiguous then. >>> --- a/xen/arch/x86/include/asm/endbr.h >>> +++ b/xen/arch/x86/include/asm/endbr.h >>> @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr) >>> *(uint32_t *)ptr = gen_endbr64(); >>> } >>> >>> +/* >>> + * After clobbering ENDBR64, we may need to confirm that the site used to >>> + * contain an ENDBR64 instruction. Use an encoding which isn't the default >>> + * P6_NOP4. >>> + */ >>> +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */ >> In case this remains as is - did you mean "opsz" instead of "osp"? >> But this really is "nopw (%rax)" anyway. > > Oh, osp is the nasm name. I'll switch to nopw. Reviewed-by: Jan Beulich Jan
[RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
Comparing a signed 64bit integer to a signed 32 bit integer may lead to unexpected overflows. Adjust the cast to use the same type. Signed-off-by: Bjoern Doebel CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall --- I need some input here. When testing the CET-BIT livepatch updates I noticed that my generated livepatch would not load due to (XEN) livepatch: vmx: Overflow in relocation 1 in .rela.altinstructions for .altinstructions A deeper look revealed that the ELF relocation adjustment seems to be going wrong and that in fact the lower 32bit of the compared values in my case were identical, but that the cast to int64_t for the value pulled in extra 32 bits, which turned out to be different. Applying this patch fixed the issue for my example and I got a fully working livepatch. However, I do not understand what is actually going on here, so I'm sending this RFC to get extra eyes on the code. --- xen/arch/x86/livepatch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 59620b8a4f..5380e18bd9 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, val -= (uint64_t)dest; *(int32_t *)dest = val; -if ( (int64_t)val != *(int32_t *)dest ) +if ( (int32_t)val != *(int32_t *)dest ) { printk(XENLOG_ERR LIVEPATCH "%s: Overflow in relocation %u in %s for %s\n", elf->name, i, rela->name, base->name); -- 2.32.0 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH v4 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.2022 16:23, Bjoern Doebel wrote: > @@ -127,15 +161,21 @@ int arch_livepatch_verify_func(const struct > livepatch_func *func) > void noinline arch_livepatch_apply(struct livepatch_func *func) > { > uint8_t *old_ptr; > -uint8_t insn[sizeof(func->opaque)]; > +struct x86_livepatch_meta *lp; > +uint8_t insn[sizeof(lp->instruction)]; > unsigned int len; > > +lp = (struct x86_livepatch_meta *)func->opaque; > +lp->patch_offset = 0; > old_ptr = func->old_addr; > len = livepatch_insn_len(func); > if ( !len ) > return; > > -memcpy(func->opaque, old_ptr, len); > +if ( is_endbr64(old_ptr) ) > +lp->patch_offset += ENDBR64_LEN; Don't you need to use was_endbr64() here as well? Jan
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.22 16:25, Ross Lagerwall wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. From: Bjoern Doebel Sent: Tuesday, March 8, 2022 10:29 AM To: xen-devel@lists.xenproject.org Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne ; Andrew Cooper ; Bjoern Doebel ; Konrad Rzeszutek Wilk ; Ross Lagerwall Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions Xen enabled CET for supporting architectures. The control flow aspect of CET expects functions that can be called indirectly (i.e., via function pointers) to start with an ENDBR64 instruction. Otherwise a control flow exception is raised. This expectation breaks livepatching flows because we patch functions by overwriting their first 5 bytes with a JMP + , thus breaking the ENDBR64. We fix this by checking the start of a patched function for being ENDBR64. In the positive case we move the livepatch JMP to start behind the ENDBR64 instruction. To avoid having to guess the ENDBR64 offset again on patch reversal (which might race with other mechanisms adding/removing ENDBR dynamically), use the livepatch metadata to store the computed offset along with the saved bytes of the overwritten function. Signed-off-by: Bjoern Doebel CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall Note that on top of livepatching functions, Xen supports an additional mode where we can "remove" a function by overwriting it with NOPs. This is only supported for functions up to 31 bytes in size and this patch reduces this limit to 30 bytes. Changes since r1: * use sizeof_field() to avoid unused variable warning * make metadata variable const in arch_livepatch_revert --- xen/arch/x86/livepatch.c | 61 ++-- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 65530c1e57..0fd97f2a00 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -14,11 +14,29 @@ #include #include +#include #include #include #include #include +/* + * CET hotpatching support: We may have functions starting with an ENDBR64 + * instruction that MUST remain the first instruction of the function, hence + * we need to move any hotpatch trampoline further into the function. For that + * we need to keep track of the patching offset used for any loaded hotpatch + * (to avoid racing against other fixups adding/removing ENDBR64 or similar + * instructions). + * + * We do so by making use of the existing opaque metadata area. We use its + * first 4 bytes to track the offset into the function used for patching and + * the remainder of the data to store overwritten code bytes. + */ +struct x86_livepatch_meta { +uint8_t patch_offset; +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; +}; + I think it would make things a bit simpler to use one of the spare _pad bits from struct livepatch_func ather than hacking it into the opaque area. Is there a reason you chose this approach? No specific reason. Are you suggesting updating the public livepatch interface to add a new member and reduce the padding size by 1 byte? I guess that will also require a patch to livepatch-build-tools? Bjoern Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08/03/2022 11:38, Roger Pau Monné wrote: > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: >> Introduce a new per-domain creation x86 specific flag to >> select whether hardware assisted virtualization should be used for >> x{2}APIC. >> >> A per-domain option is added to xl in order to select the usage of >> x{2}APIC hardware assisted virtualization, as well as a global >> configuration option. >> >> Having all APIC interaction exit to Xen for emulation is slow and can >> induce much overhead. Hardware can speed up x{2}APIC by decoding the >> APIC access and providing a VM exit with a more specific exit reason >> than a regular EPT fault or by altogether avoiding a VM exit. >> >> On the other hand, being able to disable x{2}APIC hardware assisted >> virtualization can be useful for testing and debugging purposes. >> >> Note: vmx_install_vlapic_mapping doesn't require modifications >> regardless of whether the guest has "Virtualize APIC accesses" enabled >> or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long >> as virtualize_apic_accesses is supported by the CPU. >> >> Suggested-by: Andrew Cooper >> Signed-off-by: Jane Malalane >> --- >> CC: Wei Liu >> CC: Anthony PERARD >> CC: Juergen Gross >> CC: Andrew Cooper >> CC: George Dunlap >> CC: Jan Beulich >> CC: Julien Grall >> CC: Stefano Stabellini >> CC: Christian Lindig >> CC: David Scott >> CC: Volodymyr Babchuk >> CC: "Roger Pau Monné" >> >> v5: >> * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of >>the has_assisted_x{2}apic macros >> * Following changes in assisted_x{2}apic_available definitions in >>patch 1, retighten conditionals for setting >>XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in >>cpuid_hypervisor_leaves() >> >> v4: >> * Add has_assisted_x{2}apic macros and use them where appropriate >> * Replace CPU checks with per-domain assisted_x{2}apic control >> options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(), >> following edits to assisted_x{2}apic_available definitions in >> patch 1 >> Note: new assisted_x{2}apic_available definitions make later >> cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery >> checks redundant in vmx_vlapic_msr_changed() >> >> v3: >> * Change info in xl.cfg to better express reality and fix >> capitalization of x{2}apic >> * Move "physinfo" variable definition to the beggining of >> libxl__domain_build_info_setdefault() >> * Reposition brackets in if statement to match libxl coding style >> * Shorten logic in libxl__arch_domain_build_info_setdefault() >> * Correct dprintk message in arch_sanitise_domain_config() >> * Make appropriate changes in vmx_vlapic_msr_changed() and >> cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit >> * Remove unneeded parantheses >> >> v2: >> * Add a LIBXL_HAVE_ASSISTED_APIC macro >> * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo >> * Add a return statement in now "int" >> libxl__arch_domain_build_info_setdefault() >> * Preserve libxl__arch_domain_build_info_setdefault 's location in >> libxl_create.c >> * Correct x{2}apic default setting logic in >> libxl__arch_domain_prepare_config() >> * Correct logic for parsing assisted_x{2}apic host/guest options in >> xl_parse.c and initialize them to -1 in xl.c >> * Use guest options directly in vmx_vlapic_msr_changed >> * Fix indentation of bool assisted_x{2}apic in struct hvm_domain >> * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks >> --- >> docs/man/xl.cfg.5.pod.in| 19 +++ >> docs/man/xl.conf.5.pod.in | 12 >> tools/golang/xenlight/helpers.gen.go| 12 >> tools/golang/xenlight/types.gen.go | 2 ++ >> tools/include/libxl.h | 7 +++ >> tools/libs/light/libxl_arch.h | 5 +++-- >> tools/libs/light/libxl_arm.c| 7 +-- >> tools/libs/light/libxl_create.c | 22 +- >> tools/libs/light/libxl_types.idl| 2 ++ >> tools/libs/light/libxl_x86.c| 28 ++-- >> tools/ocaml/libs/xc/xenctrl.ml | 2 ++ >> tools/ocaml/libs/xc/xenctrl.mli | 2 ++ >> tools/ocaml/libs/xc/xenctrl_stubs.c | 2 +- >> tools/xl/xl.c | 8 >> tools/xl/xl.h | 2 ++ >> tools/xl/xl_parse.c | 16 >> xen/arch/x86/domain.c | 28 +++- >> xen/arch/x86/hvm/vmx/vmcs.c | 4 >> xen/arch/x86/hvm/vmx/vmx.c | 13 - >> xen/arch/x86/include/asm/hvm/domain.h | 6 ++ >> xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 3 +++ >> xen/arch/x86/traps.c| 9 + >> xen/include/public/arch-x86/xen.h | 2 ++ >>
Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
On 08.03.2022 16:36, Bjoern Doebel wrote: > --- a/xen/arch/x86/livepatch.c > +++ b/xen/arch/x86/livepatch.c > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, > > val -= (uint64_t)dest; > *(int32_t *)dest = val; Afaict after this assignment ... > -if ( (int64_t)val != *(int32_t *)dest ) > +if ( (int32_t)val != *(int32_t *)dest ) ... this condition can never be false. The cast really wants to be to int64_t, and the overflow you saw being reported is quite likely for a different reason. But from the sole message you did quote it's not really possible to figure what else is wrong. Jan
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
> From: Doebel, Bjoern > Sent: Tuesday, March 8, 2022 3:41 PM > To: Ross Lagerwall ; > xen-devel@lists.xenproject.org > Cc: Michael Kurth ; Martin Pohlack ; > Roger Pau Monne ; Andrew Cooper > ; Konrad Rzeszutek Wilk > Subject: Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced > functions > > > On 08.03.22 16:25, Ross Lagerwall wrote: > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > >> From: Bjoern Doebel > >> Sent: Tuesday, March 8, 2022 10:29 AM > >> To: xen-devel@lists.xenproject.org > >> Cc: Michael Kurth ; Martin Pohlack ; > >> Roger Pau Monne ; Andrew Cooper > >> ; Bjoern Doebel ; Konrad > >> Rzeszutek Wilk ; Ross Lagerwall > >> > >> Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced > >> functions > >> > >> Xen enabled CET for supporting architectures. The control flow aspect of > >> CET expects functions that can be called indirectly (i.e., via function > >> pointers) to start with an ENDBR64 instruction. Otherwise a control flow > >> exception is raised. > >> > >> This expectation breaks livepatching flows because we patch functions by > >> overwriting their first 5 bytes with a JMP + , thus breaking the > >> ENDBR64. We fix this by checking the start of a patched function for > >> being ENDBR64. In the positive case we move the livepatch JMP to start > >> behind the ENDBR64 instruction. > >> > >> To avoid having to guess the ENDBR64 offset again on patch reversal > >> (which might race with other mechanisms adding/removing ENDBR > >> dynamically), use the livepatch metadata to store the computed offset > >> along with the saved bytes of the overwritten function. > >> > >> Signed-off-by: Bjoern Doebel > >> CC: Konrad Rzeszutek Wilk > >> CC: Ross Lagerwall > >> > >> Note that on top of livepatching functions, Xen supports an additional > >> mode where we can "remove" a function by overwriting it with NOPs. This > >> is only supported for functions up to 31 bytes in size and this patch > >> reduces this limit to 30 bytes. > >> > >> Changes since r1: > >> * use sizeof_field() to avoid unused variable warning > >> * make metadata variable const in arch_livepatch_revert > >> --- > >> xen/arch/x86/livepatch.c | 61 ++-- > >> 1 file changed, 53 insertions(+), 8 deletions(-) > >> > >> diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c > >> index 65530c1e57..0fd97f2a00 100644 > >> --- a/xen/arch/x86/livepatch.c > >> +++ b/xen/arch/x86/livepatch.c > >> @@ -14,11 +14,29 @@ > >> #include > >> #include > >> > >> +#include > >> #include > >> #include > >> #include > >> #include > >> > >> +/* > >> + * CET hotpatching support: We may have functions starting with an ENDBR64 > >> + * instruction that MUST remain the first instruction of the function, > >> hence > >> + * we need to move any hotpatch trampoline further into the function. For > >> that > >> + * we need to keep track of the patching offset used for any loaded > >> hotpatch > >> + * (to avoid racing against other fixups adding/removing ENDBR64 or > >> similar > >> + * instructions). > >> + * > >> + * We do so by making use of the existing opaque metadata area. We use its > >> + * first 4 bytes to track the offset into the function used for patching > >> and > >> + * the remainder of the data to store overwritten code bytes. > >> + */ > >> +struct x86_livepatch_meta { > >> +uint8_t patch_offset; > >> +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; > >> +}; > >> + > > > > I think it would make things a bit simpler to use one of the spare _pad bits > > from struct livepatch_func ather than hacking it into the opaque area. Is > > there a reason you chose this approach? > > No specific reason. Are you suggesting updating the public livepatch > interface to add a new member and reduce the padding size by 1 byte? I > guess that will also require a patch to livepatch-build-tools? > > Bjoern struct livepatch_func contains info that the build tool needs to communicate to Xen as well as space for Xen's internal book keeping while the live patch is applied. This includes the array for storing instructions, the applied flag, and now additionally the patch offset. (It's a somewhat odd arrangement but it's what we've got...) The build tool does not need to know the details about any of Xen's internal book keeping. So my preference would be to have patch_offset as an additional member next to applied (reducing padding by 1) and then in livepatch-build-tools replace: unsigned char pad[MAX_REPLACEMENT_SIZE]; uint8_t applied; uint8_t _pad[7]; with simply: uint8_t opaque[39]; What do you think? Ross
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote: > On 08/03/2022 11:38, Roger Pau Monné wrote: > > On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: > >> diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > >> b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > >> index 9119aa8536..5b7d662ed7 100644 > >> --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > >> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > >> @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v); > >> #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000 > >> extern u32 vmx_cpu_based_exec_control; > >> > >> +#define has_assisted_xapic(d) ((d)->arch.hvm.assisted_xapic) > >> +#define has_assisted_x2apic(d) ((d)->arch.hvm.assisted_x2apic) > > > > Those macros should not be in an Intel specific header, > > arch/x86/include/asm/hvm/domain.h is likely a better place. > > Actually do you think hvm.h could be better? I guess that's also fine, I did suggest hvm/domain.h because that's where the fields get declared. I guess you prefer hvm.h because there are other HVM related helpers in there? Thanks, Roger.
Re: [PATCH] x86/cet: Use dedicated NOP4 for cf_clobber
On 08/03/2022 15:36, Jan Beulich wrote: > On 08.03.2022 16:19, Andrew Cooper wrote: >> On 08/03/2022 14:37, Jan Beulich wrote: >>> On 08.03.2022 15:01, Andrew Cooper wrote: For livepatching, we need to look at a potentially clobbered function and determine whether it used to have an ENDBR64 instruction. Use a non-default 4-byte P6 long nop, not emitted by toolchains, and introduce the was_endbr64() predicate. >>> Did you consider using ENDBR32 for this purpose? >> No, and no because that's very short sighted. Even 4 non-nops would be >> better than ENDBR32, because they wouldn't create actually-usable >> codepaths in corner cases we care to exclude. > Well - I thought of ENDBR32 because elsewhere you said we don't > need to be worried about any byte sequence resembling that insn, > since for it to become "usable" an attacker would first need to > have managed to manufacture a 32-bit ring0 code segment. Now you > say effectively the opposite. We've got ~0 risk of having any embedded ENDBR32's because we never refer to the opcode, and therefore adding 2x 0.7s delay to scan the binary on build is a waste. If the check were free, it would be a different matter. At any point, if we were to introduce references to ENDBR32, we'd want to start scanning for embedded sequences. But at no point do we want to intentionally remove our defence in depth created by both having no CS32 code segment, and no ENDBR32 instructions. > >>> I'm worried that >>> the pattern you picked is still too close to what might be used >>> (down the road) by a tool chain. >> This is what Linux are doing too, but no - I'm not worried. The >> encoding isn't the only protection; toolchains also have no reason to >> put a nop4 at the head of functions; nop5 is the common one to find. > Well, okay - let's hope for the best then. > Bjoern: For the livepatching code, I think you want: if ( is_endbr64(...) || was_endbr64(...) ) needed += ENDBR64_LEN; >>> Looks like I didn't fully understand the problem then from your >>> initial description. The adjustment here (and the one needed in >>> Björn's patch) is to compensate for the advancing of the >>> targets of altcalls past the ENDBR? >> No. Consider this scenario: >> >> callee: >> endbr64 >> ... >> >> altcall: >> call *foo(%rip) >> >> During boot, we rewrite altcall to be `call callee+4` and turn endbr64 >> into nops, so it now looks like: >> >> callee: >> nop4 >> ... >> >> altcall: >> call callee+4 >> >> Then we want to livepatch callee to callee_new, so we get >> >> callee_new: >> endbr64 >> ... >> >> in the livepatch itself. >> >> Now, to actually patch, we need to memcpy(callee+4, "jmp callee_new", 5). >> >> The livepatch logic calling is_endbr(callee) doesn't work because it's >> now a nop4, which is why it needs a was_endbr64(callee) too. > Sounds like exactly what I was thinking of. Perhaps my description > wasn't sufficiently clear / unambiguous then. Ah yes - I think I did misinterpret what you wrote. I hope everything is clear now. > --- a/xen/arch/x86/include/asm/endbr.h +++ b/xen/arch/x86/include/asm/endbr.h @@ -52,4 +52,16 @@ static inline void place_endbr64(void *ptr) *(uint32_t *)ptr = gen_endbr64(); } +/* + * After clobbering ENDBR64, we may need to confirm that the site used to + * contain an ENDBR64 instruction. Use an encoding which isn't the default + * P6_NOP4. + */ +#define ENDBR64_POISON "\x66\x0f\x1f\x00" /* osp nopl (%rax) */ >>> In case this remains as is - did you mean "opsz" instead of "osp"? >>> But this really is "nopw (%rax)" anyway. >> Oh, osp is the nasm name. I'll switch to nopw. > Reviewed-by: Jan Beulich Thanks. ~Andrew
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08/03/2022 16:02, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote: >> On 08/03/2022 11:38, Roger Pau Monné wrote: >>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h index 9119aa8536..5b7d662ed7 100644 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000 extern u32 vmx_cpu_based_exec_control; +#define has_assisted_xapic(d) ((d)->arch.hvm.assisted_xapic) +#define has_assisted_x2apic(d) ((d)->arch.hvm.assisted_x2apic) >>> >>> Those macros should not be in an Intel specific header, >>> arch/x86/include/asm/hvm/domain.h is likely a better place. >> >> Actually do you think hvm.h could be better? > > I guess that's also fine, I did suggest hvm/domain.h because that's > where the fields get declared. I guess you prefer hvm.h because there > are other HVM related helpers in there? Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h, for e.g. Thank you for pointing this out, Jane.
Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote: > On 08.03.2022 16:36, Bjoern Doebel wrote: > > --- a/xen/arch/x86/livepatch.c > > +++ b/xen/arch/x86/livepatch.c > > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf > > *elf, > > > > val -= (uint64_t)dest; > > *(int32_t *)dest = val; > > Afaict after this assignment ... > > > -if ( (int64_t)val != *(int32_t *)dest ) > > +if ( (int32_t)val != *(int32_t *)dest ) > > ... this condition can never be false. The cast really wants to be > to int64_t, and the overflow you saw being reported is quite likely > for a different reason. But from the sole message you did quote > it's not really possible to figure what else is wrong. It seems Linux has that check ifdef'ed [0], but there's no reference as to why it's that way (I've tracked it back to the x86-64 import on the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076). It's a 64bit relocation using a 32bit value, but it's unclear to me that modifying the top 32bits is not allowed/intended. Regards, Roger. [0] https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/module.c#L190
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On 08/03/2022 16:02, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote: >> On 08/03/2022 11:38, Roger Pau Monné wrote: >>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h index 9119aa8536..5b7d662ed7 100644 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000 extern u32 vmx_cpu_based_exec_control; +#define has_assisted_xapic(d) ((d)->arch.hvm.assisted_xapic) +#define has_assisted_x2apic(d) ((d)->arch.hvm.assisted_x2apic) >>> >>> Those macros should not be in an Intel specific header, >>> arch/x86/include/asm/hvm/domain.h is likely a better place. >> >> Actually do you think hvm.h could be better? > > I guess that's also fine, I did suggest hvm/domain.h because that's > where the fields get declared. I guess you prefer hvm.h because there > are other HVM related helpers in there? Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h for e.g. Thanks again, Jane.
Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active
On 08/03/2022 08:15, Jan Beulich wrote: > On 07.03.2022 21:53, Andrew Cooper wrote: >> --- a/xen/arch/x86/machine_kexec.c >> +++ b/xen/arch/x86/machine_kexec.c >> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) >> */ >> local_irq_disable(); >> >> +/* Reset CPUID masking and faulting to the host's default. */ >> +ctxt_switch_levelling(NULL); >> + >> +/* Disable CET. */ >> +if ( read_cr4() & X86_CR4_CET ) >> +{ >> +wrmsrl(MSR_S_CET, 0); >> +write_cr4(read_cr4() & ~X86_CR4_CET); >> +} >> + >> /* Now regular interrupts are disabled, we need to reduce the impact >> * of interrupts not disabled by 'cli'. >> * > Besides introducing somewhat of a disconnect between the comment in > context here and the earlier local_irq_disable(), is it really > necessary to do both actions with IRQs off? We are a handful of instructions away from discarding Xen's context entirely. IRQs are not a relevant concern. If we're nitpicking, irqs want to be off before kexecing gets set, because absolutely nothing good can come of handling interrupts later than that point. ~Andrew
Re: [PATCH v5 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
On Tue, Mar 08, 2022 at 04:16:21PM +, Jane Malalane wrote: > On 08/03/2022 16:02, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 03:44:18PM +, Jane Malalane wrote: > >> On 08/03/2022 11:38, Roger Pau Monné wrote: > >>> On Mon, Mar 07, 2022 at 03:06:09PM +, Jane Malalane wrote: > diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > index 9119aa8536..5b7d662ed7 100644 > --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h > @@ -220,6 +220,9 @@ void vmx_vmcs_reload(struct vcpu *v); > #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000 > extern u32 vmx_cpu_based_exec_control; > > +#define has_assisted_xapic(d) ((d)->arch.hvm.assisted_xapic) > +#define has_assisted_x2apic(d) ((d)->arch.hvm.assisted_x2apic) > >>> > >>> Those macros should not be in an Intel specific header, > >>> arch/x86/include/asm/hvm/domain.h is likely a better place. > >> > >> Actually do you think hvm.h could be better? > > > > I guess that's also fine, I did suggest hvm/domain.h because that's > > where the fields get declared. I guess you prefer hvm.h because there > > are other HVM related helpers in there? > > Yeah, that is why - tsc_scaling_ratio also gets defined in domain.h for e.g. I'm fine with placing it in hvm.h. Thanks, Roger.
Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote: > > On 08.03.2022 16:36, Bjoern Doebel wrote: > > > --- a/xen/arch/x86/livepatch.c > > > +++ b/xen/arch/x86/livepatch.c > > > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf > > > *elf, > > > > > > val -= (uint64_t)dest; > > > *(int32_t *)dest = val; > > > > Afaict after this assignment ... > > > > > -if ( (int64_t)val != *(int32_t *)dest ) > > > +if ( (int32_t)val != *(int32_t *)dest ) > > > > ... this condition can never be false. The cast really wants to be > > to int64_t, and the overflow you saw being reported is quite likely > > for a different reason. But from the sole message you did quote > > it's not really possible to figure what else is wrong. > > It seems Linux has that check ifdef'ed [0], but there's no reference > as to why it's that way (I've tracked it back to the x86-64 import on > the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076). > > It's a 64bit relocation using a 32bit value, but it's unclear to me > that modifying the top 32bits is not allowed/intended. Urg, I've worded this very badly. It's a 64bit relocation using a 32bit value, but it's unclear to me that modifying the top 32bits is not allowed/intended and fine to be dropped. Thanks, Roger.
Re: [PATCH 4.16] VT-d: drop undue address-of from check_cleanup_domid_map()
On Tue, Mar 08, 2022 at 04:27:00PM +0100, Jan Beulich wrote: > For an unknown reason I added back the operator while backporting, > despite 4.16 having c06e3d810314 ("VT-d: per-domain IOMMU bitmap needs > to have dynamic size"). I can only assume that I mistakenly took the > 4.15 backport as basis and/or reference. > > Fixes: fa45f6b5560e ("VT-d: split domid map cleanup check into a function") > Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On Tue, Mar 08, 2022 at 04:08:53PM +0100, Jan Beulich wrote: > On 08.03.2022 15:18, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote: > >> On 08.03.2022 14:49, Roger Pau Monne wrote: > >>> So it can be explicitly placed ahead of the rest of the .text content > >>> in the linker script (and thus the resulting image). This is a > >>> prerequisite for further work that will add a catch-all to the text > >>> section (.text.*). > >>> > >>> Note that placement of the sections inside of .text is also slightly > >>> adjusted to be more similar to the position found in the default GNU > >>> ld linker script. > >>> > >>> The special handling of the object file containing the header data as > >>> the first object file passed to the linker command line can also be > >>> removed. > >>> > >>> While there also remove the special handling of efi/ on x86. There's > >>> no need for the resulting object file to be passed in any special > >>> order to the linker. > >>> > >>> Signed-off-by: Roger Pau Monné > >> > >> Looks good to me, but I have one question before feeling ready to > >> offer R-b: > >> > >>> @@ -86,8 +84,13 @@ SECTIONS > >>> *(.text.kexec) /* Page aligned in the object file. */ > >>> kexec_reloc_end = .; > >>> > >>> - *(.text.cold) > >>> - *(.text.unlikely) > >>> + *(.text.cold .text.cold.*) > >>> + *(.text.unlikely .text.*_unlikely .text.unlikely.*) > >> > >> What generates .text.*_unlikely? And if anything really does, why > >> would .text.cold not have a similar equivalent? > > > > That matches what I saw in the default linker script from my version > > of GNU ld: > > > > *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > > > I really don't know what could generate .text.*_unlikely, but since > > it's part of the default linker script I assumed it was better to just > > add it. > > I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from > 4.6.x. onwards it would be .text.unlikely.*. > > As to the dissimilarity with .text.cold: I wonder why we have that in > the first place. It matches our __cold attribute, just that we don't > use that anywhere afaics. > > In any event: > Reviewed-by: Jan Beulich > albeit preferably with .text.cold.* dropped again. Would you mind dropping the .text.cold.* at commit? Otherwise I can resend. Thanks, Roger.
Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
On Tue, Mar 08, 2022 at 04:13:55PM +0100, Jan Beulich wrote: > On 08.03.2022 15:46, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote: > >> On 08.03.2022 14:49, Roger Pau Monne wrote: > >>> If livepatching support is enabled build the hypervisor with > >>> -f{function,data}-sections compiler options, which is required by the > >>> livepatching tools to detect changes and create livepatches. > >>> > >>> This shouldn't result in any functional change on the hypervisor > >>> binary image, but does however require some changes in the linker > >>> script in order to handle that each function and data item will now be > >>> placed into its own section in object files. As a result add catch-all > >>> for .text, .data and .bss in order to merge each individual item > >>> section into the final image. > >>> > >>> The main difference will be that .text.startup will end up being part > >>> of .text rather than .init, and thus won't be freed. .text.exit will > >>> also be part of .text rather than dropped. Overall this could make the > >>> image bigger, and package some .text code in a sub-optimal way. > >>> > >>> On Arm the .data.read_mostly needs to be moved ahead of the .data > >>> section like it's already done on x86, so the .data.* catch-all > >>> doesn't also include .data.read_mostly. The alignment of > >>> .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end > >>> up being placed at the tail of a read-only page from the previous > >>> section. While there move the alignment of the .data section ahead of > >>> the section declaration, like it's done for other sections. > >>> > >>> The benefit of having CONFIG_LIVEPATCH enable those compiler option > >>> is that the livepatch build tools no longer need to fiddle with the > >>> build system in order to enable them. Note the current livepatch tools > >>> are broken after the recent build changes due to the way they > >>> attempt to set -f{function,data}-sections. > >>> > >>> Signed-off-by: Roger Pau Monné > >> > >> Reviewed-by: Jan Beulich > >> > >>> --- a/xen/arch/x86/xen.lds.S > >>> +++ b/xen/arch/x86/xen.lds.S > >>> @@ -88,6 +88,9 @@ SECTIONS > >>> *(.text.unlikely .text.*_unlikely .text.unlikely.*) > >>> > >>> *(.text) > >>> +#ifdef CONFIG_CC_SPLIT_SECTIONS > >>> + *(.text.*) > >>> +#endif > >>> *(.text.__x86_indirect_thunk_*) > >>> *(.text.page_aligned) > >> > >> These last two now will not have any effect anymore when > >> CC_SPLIT_SECTIONS=y. This may have undesirable effects on the > >> overall size when there is more than one object with a > >> .text.page_aligned contribution. In .data ... > > > > Agreed. I wondered whether to move those ahead of the main text > > section, so likely: > > > >*(.text.unlikely .text.*_unlikely .text.unlikely.*) > > > >*(.text.page_aligned) > >*(.text.__x86_indirect_thunk_*) > >*(.text) > > #ifdef CONFIG_CC_SPLIT_SECTIONS > >*(.text.*) > > #endif > > Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*, > though. When adding .text.* that one can likely go away. > > > FWIW, Linux seems fine to package .text.page_aligned together with the > > rest of .text using the .text.[0-9a-zA-Z_]* catch-all. > > There's no question this is functionally fine. The question is how > many extra padding areas are inserted because of this. > > >>> @@ -292,9 +295,7 @@ SECTIONS > >>> > >>>DECL_SECTION(.data) { > >>> *(.data.page_aligned) > >>> - *(.data) > >>> - *(.data.rel) > >>> - *(.data.rel.*) > >>> + *(.data .data.*) > >>>} PHDR(text) > >> > >> ... this continues to be named first. I wonder whether we wouldn't > >> want to use SORT_BY_ALIGNMENT (if available) instead in both places. > > > > We could use the command line option if available > > (--sort-section=alignment) to sort all wildcard sections? > > Depends on the scope of the sorting that would result when enabled > globally like this. I'm not sure I'm following. Don't we generally want to sort by alignment in order to avoid adding unnecessary padding as much as possible? For any wildcard sections we really don't care anymore how they are sorted? Thanks, Roger.
Re: [PATCH] x86/kexec: Fix kexec-reboot with CET active
On 08.03.2022 17:22, Andrew Cooper wrote: > On 08/03/2022 08:15, Jan Beulich wrote: >> On 07.03.2022 21:53, Andrew Cooper wrote: >>> --- a/xen/arch/x86/machine_kexec.c >>> +++ b/xen/arch/x86/machine_kexec.c >>> @@ -156,6 +156,16 @@ void machine_kexec(struct kexec_image *image) >>> */ >>> local_irq_disable(); >>> >>> +/* Reset CPUID masking and faulting to the host's default. */ >>> +ctxt_switch_levelling(NULL); >>> + >>> +/* Disable CET. */ >>> +if ( read_cr4() & X86_CR4_CET ) >>> +{ >>> +wrmsrl(MSR_S_CET, 0); >>> +write_cr4(read_cr4() & ~X86_CR4_CET); >>> +} >>> + >>> /* Now regular interrupts are disabled, we need to reduce the impact >>> * of interrupts not disabled by 'cli'. >>> * >> Besides introducing somewhat of a disconnect between the comment in >> context here and the earlier local_irq_disable(), is it really >> necessary to do both actions with IRQs off? > > We are a handful of instructions away from discarding Xen's context > entirely. IRQs are not a relevant concern. Well, as said - the comment was what caught my eye. But as you appear to think that slight disconnect is not an issue: I don't mean my remark to be an objection. Feel free to commit with David's R-b. Jan
Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
On 08.03.2022 17:26, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote: >> On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote: >>> On 08.03.2022 16:36, Bjoern Doebel wrote: --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf, val -= (uint64_t)dest; *(int32_t *)dest = val; >>> >>> Afaict after this assignment ... >>> -if ( (int64_t)val != *(int32_t *)dest ) +if ( (int32_t)val != *(int32_t *)dest ) >>> >>> ... this condition can never be false. The cast really wants to be >>> to int64_t, and the overflow you saw being reported is quite likely >>> for a different reason. But from the sole message you did quote >>> it's not really possible to figure what else is wrong. >> >> It seems Linux has that check ifdef'ed [0], but there's no reference >> as to why it's that way (I've tracked it back to the x86-64 import on >> the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076). >> >> It's a 64bit relocation using a 32bit value, but it's unclear to me >> that modifying the top 32bits is not allowed/intended. > > Urg, I've worded this very badly. It's a 64bit relocation using a > 32bit value, but it's unclear to me that modifying the top 32bits is > not allowed/intended and fine to be dropped. I'm still confused: Afaics this is in the handling of R_X86_64_PC32, which is a 32-bit relocation. Only a 32-bit field in memory is to be modified, and the resulting value needs to fit such that when the CPU fetches it and sign-extends it to 64 bits, the original value is re-established. Hence the check, aiui. Jan
Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when computing ELF relocations
> From: Roger Pau Monne > Sent: Tuesday, March 8, 2022 4:26 PM > To: Bjoern Doebel ; Jan Beulich > Cc: Michael Kurth ; Martin Pohlack ; > Konrad Rzeszutek Wilk ; Ross Lagerwall > ; xen-devel@lists.xenproject.org > > Subject: Re: [RFC PATCH v1] arch/x86: Livepatch: fix overflow check when > computing ELF relocations > > On Tue, Mar 08, 2022 at 05:15:33PM +0100, Roger Pau Monné wrote: > > On Tue, Mar 08, 2022 at 04:45:34PM +0100, Jan Beulich wrote: > > > On 08.03.2022 16:36, Bjoern Doebel wrote: > > > > --- a/xen/arch/x86/livepatch.c > > > > +++ b/xen/arch/x86/livepatch.c > > > > @@ -339,7 +339,7 @@ int arch_livepatch_perform_rela(struct > > > > livepatch_elf *elf, > > > > > > > > val -= (uint64_t)dest; > > > > *(int32_t *)dest = val; > > > > > > Afaict after this assignment ... > > > > > > > -if ( (int64_t)val != *(int32_t *)dest ) > > > > +if ( (int32_t)val != *(int32_t *)dest ) > > > > > > ... this condition can never be false. The cast really wants to be > > > to int64_t, and the overflow you saw being reported is quite likely > > > for a different reason. But from the sole message you did quote > > > it's not really possible to figure what else is wrong. > > > > It seems Linux has that check ifdef'ed [0], but there's no reference > > as to why it's that way (I've tracked it back to the x86-64 import on > > the historical tree, f3081f5b66a06175ff2dabfe885a53fb04e71076). > > > > It's a 64bit relocation using a 32bit value, but it's unclear to me > > that modifying the top 32bits is not allowed/intended. > > Urg, I've worded this very badly. It's a 64bit relocation using a > 32bit value, but it's unclear to me that modifying the top 32bits is > not allowed/intended and fine to be dropped. > > Thanks, Roger. I'm not sure what you mean by that. The value is computed based on the load address and the address of the target symbol - i.e. it is a PC-relative relocation, and the code is checking that the computed relative value hasn't overflowed the 32-bit destination in memory e.g. in the unlikely case that the live patch is loaded far away in memory from the hypervisor. The code looks correct to me. It needs investigation to find out why this particular patch is causing an issue since the code is unchanged since v7 of the original xSplice patch series. Ross
Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions
On 08.03.22 17:01, Ross Lagerwall wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. From: Doebel, Bjoern Sent: Tuesday, March 8, 2022 3:41 PM To: Ross Lagerwall ; xen-devel@lists.xenproject.org Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne ; Andrew Cooper ; Konrad Rzeszutek Wilk Subject: Re: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions On 08.03.22 16:25, Ross Lagerwall wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. From: Bjoern Doebel Sent: Tuesday, March 8, 2022 10:29 AM To: xen-devel@lists.xenproject.org Cc: Michael Kurth ; Martin Pohlack ; Roger Pau Monne ; Andrew Cooper ; Bjoern Doebel ; Konrad Rzeszutek Wilk ; Ross Lagerwall Subject: [PATCH v3 2/2] xen/x86: Livepatch: support patching CET-enhanced functions Xen enabled CET for supporting architectures. The control flow aspect of CET expects functions that can be called indirectly (i.e., via function pointers) to start with an ENDBR64 instruction. Otherwise a control flow exception is raised. This expectation breaks livepatching flows because we patch functions by overwriting their first 5 bytes with a JMP + , thus breaking the ENDBR64. We fix this by checking the start of a patched function for being ENDBR64. In the positive case we move the livepatch JMP to start behind the ENDBR64 instruction. To avoid having to guess the ENDBR64 offset again on patch reversal (which might race with other mechanisms adding/removing ENDBR dynamically), use the livepatch metadata to store the computed offset along with the saved bytes of the overwritten function. Signed-off-by: Bjoern Doebel CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall Note that on top of livepatching functions, Xen supports an additional mode where we can "remove" a function by overwriting it with NOPs. This is only supported for functions up to 31 bytes in size and this patch reduces this limit to 30 bytes. Changes since r1: * use sizeof_field() to avoid unused variable warning * make metadata variable const in arch_livepatch_revert --- xen/arch/x86/livepatch.c | 61 ++-- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index 65530c1e57..0fd97f2a00 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -14,11 +14,29 @@ #include #include +#include #include #include #include #include +/* + * CET hotpatching support: We may have functions starting with an ENDBR64 + * instruction that MUST remain the first instruction of the function, hence + * we need to move any hotpatch trampoline further into the function. For that + * we need to keep track of the patching offset used for any loaded hotpatch + * (to avoid racing against other fixups adding/removing ENDBR64 or similar + * instructions). + * + * We do so by making use of the existing opaque metadata area. We use its + * first 4 bytes to track the offset into the function used for patching and + * the remainder of the data to store overwritten code bytes. + */ +struct x86_livepatch_meta { +uint8_t patch_offset; +uint8_t instruction[LIVEPATCH_OPAQUE_SIZE - sizeof(uint8_t)]; +}; + I think it would make things a bit simpler to use one of the spare _pad bits from struct livepatch_func ather than hacking it into the opaque area. Is there a reason you chose this approach? No specific reason. Are you suggesting updating the public livepatch interface to add a new member and reduce the padding size by 1 byte? I guess that will also require a patch to livepatch-build-tools? Bjoern struct livepatch_func contains info that the build tool needs to communicate to Xen as well as space for Xen's internal book keeping while the live patch is applied. This includes the array for storing instructions, the applied flag, and now additionally the patch offset. (It's a somewhat odd arrangement but it's what we've got...) The build tool does not need to know the details about any of Xen's internal book keeping. So my preference would be to have patch_offset as an additional member next to applied (reducing padding by 1) and then in livepatch-build-tools replace: unsigned char pad[MAX_REPLACEMENT_SIZE]; uint8_t applied; uint8_t _pad[7]; with simply: uint8_t opaque[39]; What do you think? That will simplify this patch - I like it. Will send update + livepatch-build patch. Bjoern Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH v3 1/2] xen/build: put image header into a separate section
On 08.03.2022 17:36, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 04:08:53PM +0100, Jan Beulich wrote: >> On 08.03.2022 15:18, Roger Pau Monné wrote: >>> On Tue, Mar 08, 2022 at 02:57:23PM +0100, Jan Beulich wrote: On 08.03.2022 14:49, Roger Pau Monne wrote: > So it can be explicitly placed ahead of the rest of the .text content > in the linker script (and thus the resulting image). This is a > prerequisite for further work that will add a catch-all to the text > section (.text.*). > > Note that placement of the sections inside of .text is also slightly > adjusted to be more similar to the position found in the default GNU > ld linker script. > > The special handling of the object file containing the header data as > the first object file passed to the linker command line can also be > removed. > > While there also remove the special handling of efi/ on x86. There's > no need for the resulting object file to be passed in any special > order to the linker. > > Signed-off-by: Roger Pau Monné Looks good to me, but I have one question before feeling ready to offer R-b: > @@ -86,8 +84,13 @@ SECTIONS > *(.text.kexec) /* Page aligned in the object file. */ > kexec_reloc_end = .; > > - *(.text.cold) > - *(.text.unlikely) > + *(.text.cold .text.cold.*) > + *(.text.unlikely .text.*_unlikely .text.unlikely.*) What generates .text.*_unlikely? And if anything really does, why would .text.cold not have a similar equivalent? >>> >>> That matches what I saw in the default linker script from my version >>> of GNU ld: >>> >>> *(.text.unlikely .text.*_unlikely .text.unlikely.*) >>> >>> I really don't know what could generate .text.*_unlikely, but since >>> it's part of the default linker script I assumed it was better to just >>> add it. >> >> I've checked - gcc up to 4.5.x would generate .text.*_unlikely; from >> 4.6.x. onwards it would be .text.unlikely.*. >> >> As to the dissimilarity with .text.cold: I wonder why we have that in >> the first place. It matches our __cold attribute, just that we don't >> use that anywhere afaics. >> >> In any event: >> Reviewed-by: Jan Beulich >> albeit preferably with .text.cold.* dropped again. > > Would you mind dropping the .text.cold.* at commit? Otherwise I can > resend. Sure; no need to resend just for this. Jan
Re: [PATCH v3 2/2] livepatch: set -f{function,data}-sections compiler option
On 08.03.2022 17:41, Roger Pau Monné wrote: > On Tue, Mar 08, 2022 at 04:13:55PM +0100, Jan Beulich wrote: >> On 08.03.2022 15:46, Roger Pau Monné wrote: >>> On Tue, Mar 08, 2022 at 03:09:17PM +0100, Jan Beulich wrote: On 08.03.2022 14:49, Roger Pau Monne wrote: > If livepatching support is enabled build the hypervisor with > -f{function,data}-sections compiler options, which is required by the > livepatching tools to detect changes and create livepatches. > > This shouldn't result in any functional change on the hypervisor > binary image, but does however require some changes in the linker > script in order to handle that each function and data item will now be > placed into its own section in object files. As a result add catch-all > for .text, .data and .bss in order to merge each individual item > section into the final image. > > The main difference will be that .text.startup will end up being part > of .text rather than .init, and thus won't be freed. .text.exit will > also be part of .text rather than dropped. Overall this could make the > image bigger, and package some .text code in a sub-optimal way. > > On Arm the .data.read_mostly needs to be moved ahead of the .data > section like it's already done on x86, so the .data.* catch-all > doesn't also include .data.read_mostly. The alignment of > .data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end > up being placed at the tail of a read-only page from the previous > section. While there move the alignment of the .data section ahead of > the section declaration, like it's done for other sections. > > The benefit of having CONFIG_LIVEPATCH enable those compiler option > is that the livepatch build tools no longer need to fiddle with the > build system in order to enable them. Note the current livepatch tools > are broken after the recent build changes due to the way they > attempt to set -f{function,data}-sections. > > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -88,6 +88,9 @@ SECTIONS > *(.text.unlikely .text.*_unlikely .text.unlikely.*) > > *(.text) > +#ifdef CONFIG_CC_SPLIT_SECTIONS > + *(.text.*) > +#endif > *(.text.__x86_indirect_thunk_*) > *(.text.page_aligned) These last two now will not have any effect anymore when CC_SPLIT_SECTIONS=y. This may have undesirable effects on the overall size when there is more than one object with a .text.page_aligned contribution. In .data ... >>> >>> Agreed. I wondered whether to move those ahead of the main text >>> section, so likely: >>> >>>*(.text.unlikely .text.*_unlikely .text.unlikely.*) >>> >>>*(.text.page_aligned) >>>*(.text.__x86_indirect_thunk_*) >>>*(.text) >>> #ifdef CONFIG_CC_SPLIT_SECTIONS >>>*(.text.*) >>> #endif >> >> Perhaps; I'm not really worried of .text.__x86_indirect_thunk_*, >> though. When adding .text.* that one can likely go away. >> >>> FWIW, Linux seems fine to package .text.page_aligned together with the >>> rest of .text using the .text.[0-9a-zA-Z_]* catch-all. >> >> There's no question this is functionally fine. The question is how >> many extra padding areas are inserted because of this. >> > @@ -292,9 +295,7 @@ SECTIONS > >DECL_SECTION(.data) { > *(.data.page_aligned) > - *(.data) > - *(.data.rel) > - *(.data.rel.*) > + *(.data .data.*) >} PHDR(text) ... this continues to be named first. I wonder whether we wouldn't want to use SORT_BY_ALIGNMENT (if available) instead in both places. >>> >>> We could use the command line option if available >>> (--sort-section=alignment) to sort all wildcard sections? >> >> Depends on the scope of the sorting that would result when enabled >> globally like this. > > I'm not sure I'm following. Don't we generally want to sort by > alignment in order to avoid adding unnecessary padding as much as > possible? > > For any wildcard sections we really don't care anymore how they are > sorted? Sure. Question is whether sorting is limited to within any single *(...) construct, or whether it could extend to adjacent ones. IOW whether the command line option strictly is a replacement of adding SORT_BY_ALIGNMENT to every one of these constructs. Jan
[PATCH v6 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86
Add XEN_SYSCTL_PHYSCAP_X86_ASSISTED_XAPIC and XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X2APIC to report accelerated xapic and x2apic, on x86 hardware. No such features are currently implemented on AMD hardware. HW assisted xAPIC virtualization will be reported if HW, at the minimum, supports virtualize_apic_accesses as this feature alone means that an access to the APIC page will cause an APIC-access VM exit. An APIC-access VM exit provides a VMM with information about the access causing the VM exit, unlike a regular EPT fault, thus simplifying some internal handling. HW assisted x2APIC virtualization will be reported if HW supports virtualize_x2apic_mode and, at least, either apic_reg_virt or virtual_intr_delivery. This also means that sysctl follows the conditionals in vmx_vlapic_msr_changed(). For that purpose, also add an arch-specific "capabilities" parameter to struct xen_sysctl_physinfo. Note that this interface is intended to be compatible with AMD so that AVIC support can be introduced in a future patch. Unlike Intel that has multiple controls for APIC Virtualization, AMD has one global 'AVIC Enable' control bit, so fine-graining of APIC virtualization control cannot be done on a common interface. Suggested-by: Andrew Cooper Signed-off-by: Jane Malalane --- CC: Wei Liu CC: Anthony PERARD CC: Juergen Gross CC: Andrew Cooper CC: George Dunlap CC: Jan Beulich CC: Julien Grall CC: Stefano Stabellini CC: Volodymyr Babchuk CC: Bertrand Marquis CC: Jun Nakajima CC: Kevin Tian CC: "Roger Pau Monné" v6: * Limit abi check to x86 * Fix coding style issue v5: * Have assisted_xapic_available solely depend on cpu_has_vmx_virtualize_apic_accesses and assisted_x2apic_available depend on cpu_has_vmx_virtualize_x2apic_mode and cpu_has_vmx_apic_reg_virt OR cpu_has_vmx_virtual_intr_delivery v4: * Fallback to the original v2/v1 conditions for setting assisted_xapic_available and assisted_x2apic_available so that in the future APIC virtualization can be exposed on AMD hardware since fine-graining of "AVIC" is not supported, i.e., AMD solely uses "AVIC Enable". This also means that sysctl mimics what's exposed in CPUID v3: * Define XEN_SYSCTL_PHYSCAP_ARCH_MAX for ABI checking and actually set "arch_capbilities", via a call to c_bitmap_to_ocaml_list() * Have assisted_x2apic_available only depend on cpu_has_vmx_virtualize_x2apic_mode v2: * Use one macro LIBXL_HAVE_PHYSINFO_ASSISTED_APIC instead of two * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo * Set assisted_x{2}apic_available to be conditional upon "bsp" and annotate it with __ro_after_init * Change XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_X{2}APIC to _X86_ASSISTED_X{2}APIC * Keep XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2}APIC contained within sysctl.h * Fix padding introduced in struct xen_sysctl_physinfo and bump XEN_SYSCTL_INTERFACE_VERSION --- tools/golang/xenlight/helpers.gen.go | 4 tools/golang/xenlight/types.gen.go | 2 ++ tools/include/libxl.h| 7 +++ tools/libs/light/libxl.c | 3 +++ tools/libs/light/libxl_arch.h| 4 tools/libs/light/libxl_arm.c | 5 + tools/libs/light/libxl_types.idl | 2 ++ tools/libs/light/libxl_x86.c | 11 +++ tools/ocaml/libs/xc/xenctrl.ml | 5 + tools/ocaml/libs/xc/xenctrl.mli | 5 + tools/ocaml/libs/xc/xenctrl_stubs.c | 15 +-- tools/xl/xl_info.c | 6 -- xen/arch/x86/hvm/vmx/vmcs.c | 9 + xen/arch/x86/include/asm/domain.h| 3 +++ xen/arch/x86/sysctl.c| 7 +++ xen/include/public/sysctl.h | 11 ++- 16 files changed, 94 insertions(+), 5 deletions(-) diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index b746ff1081..dd4e6c9f14 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -3373,6 +3373,8 @@ x.CapVmtrace = bool(xc.cap_vmtrace) x.CapVpmu = bool(xc.cap_vpmu) x.CapGnttabV1 = bool(xc.cap_gnttab_v1) x.CapGnttabV2 = bool(xc.cap_gnttab_v2) +x.CapAssistedXapic = bool(xc.cap_assisted_xapic) +x.CapAssistedX2Apic = bool(xc.cap_assisted_x2apic) return nil} @@ -3407,6 +3409,8 @@ xc.cap_vmtrace = C.bool(x.CapVmtrace) xc.cap_vpmu = C.bool(x.CapVpmu) xc.cap_gnttab_v1 = C.bool(x.CapGnttabV1) xc.cap_gnttab_v2 = C.bool(x.CapGnttabV2) +xc.cap_assisted_xapic = C.bool(x.CapAssistedXapic) +xc.cap_assisted_x2apic = C.bool(x.CapAssistedX2Apic) return nil } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index b1e84d5258..87be46c745 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -1014,6 +1014,8 @@ CapVmtrace bool CapVpmu bool CapGnttabV1 bool CapGnttabV2 bool +CapAssistedXapic bool +CapAssistedX2Apic bool } type Connectorinfo struct { diff --git a/tools/include/libxl.h b/tools/include/libxl.h in
[PATCH v6 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC
Introduce a new per-domain creation x86 specific flag to select whether hardware assisted virtualization should be used for x{2}APIC. A per-domain option is added to xl in order to select the usage of x{2}APIC hardware assisted virtualization, as well as a global configuration option. Having all APIC interaction exit to Xen for emulation is slow and can induce much overhead. Hardware can speed up x{2}APIC by decoding the APIC access and providing a VM exit with a more specific exit reason than a regular EPT fault or by altogether avoiding a VM exit. On the other hand, being able to disable x{2}APIC hardware assisted virtualization can be useful for testing and debugging purposes. Note: vmx_install_vlapic_mapping doesn't require modifications regardless of whether the guest has "Virtualize APIC accesses" enabled or not, i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long as virtualize_apic_accesses is supported by the CPU. Suggested-by: Andrew Cooper Signed-off-by: Jane Malalane --- CC: Wei Liu CC: Anthony PERARD CC: Juergen Gross CC: Andrew Cooper CC: George Dunlap CC: Jan Beulich CC: Julien Grall CC: Stefano Stabellini CC: Christian Lindig CC: David Scott CC: Volodymyr Babchuk CC: "Roger Pau Monné" v6: * Use ENODEV instead of EINVAL when rejecting assisted_x{2}apic for PV guests * Move has_assisted_x{2}apic macros out of an Intel specific header * Remove references to Intel specific features in documentation v5: * Revert v4 changes in vmx_vlapic_msr_changed(), preserving the use of the has_assisted_x{2}apic macros * Following changes in assisted_x{2}apic_available definitions in patch 1, retighten conditionals for setting XEN_HVM_CPUID_APIC_ACCESS_VIRT and XEN_HVM_CPUID_X2APIC_VIRT in cpuid_hypervisor_leaves() v4: * Add has_assisted_x{2}apic macros and use them where appropriate * Replace CPU checks with per-domain assisted_x{2}apic control options in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves(), following edits to assisted_x{2}apic_available definitions in patch 1 Note: new assisted_x{2}apic_available definitions make later cpu_has_vmx_apic_reg_virt and cpu_has_vmx_virtual_intr_delivery checks redundant in vmx_vlapic_msr_changed() v3: * Change info in xl.cfg to better express reality and fix capitalization of x{2}apic * Move "physinfo" variable definition to the beggining of libxl__domain_build_info_setdefault() * Reposition brackets in if statement to match libxl coding style * Shorten logic in libxl__arch_domain_build_info_setdefault() * Correct dprintk message in arch_sanitise_domain_config() * Make appropriate changes in vmx_vlapic_msr_changed() and cpuid_hypervisor_leaves() for amended "assisted_x2apic" bit * Remove unneeded parantheses v2: * Add a LIBXL_HAVE_ASSISTED_APIC macro * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo * Add a return statement in now "int" libxl__arch_domain_build_info_setdefault() * Preserve libxl__arch_domain_build_info_setdefault 's location in libxl_create.c * Correct x{2}apic default setting logic in libxl__arch_domain_prepare_config() * Correct logic for parsing assisted_x{2}apic host/guest options in xl_parse.c and initialize them to -1 in xl.c * Use guest options directly in vmx_vlapic_msr_changed * Fix indentation of bool assisted_x{2}apic in struct hvm_domain * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks --- docs/man/xl.cfg.5.pod.in | 15 +++ docs/man/xl.conf.5.pod.in | 12 tools/golang/xenlight/helpers.gen.go | 12 tools/golang/xenlight/types.gen.go| 2 ++ tools/include/libxl.h | 7 +++ tools/libs/light/libxl_arch.h | 5 +++-- tools/libs/light/libxl_arm.c | 7 +-- tools/libs/light/libxl_create.c | 22 +- tools/libs/light/libxl_types.idl | 2 ++ tools/libs/light/libxl_x86.c | 27 +-- tools/ocaml/libs/xc/xenctrl.ml| 2 ++ tools/ocaml/libs/xc/xenctrl.mli | 2 ++ tools/ocaml/libs/xc/xenctrl_stubs.c | 2 +- tools/xl/xl.c | 8 tools/xl/xl.h | 2 ++ tools/xl/xl_parse.c | 16 xen/arch/x86/domain.c | 28 +++- xen/arch/x86/hvm/vmx/vmcs.c | 4 xen/arch/x86/hvm/vmx/vmx.c| 13 - xen/arch/x86/include/asm/hvm/domain.h | 6 ++ xen/arch/x86/include/asm/hvm/hvm.h| 6 ++ xen/arch/x86/traps.c | 5 +++-- xen/include/public/arch-x86/xen.h | 2 ++ 23 files changed, 179 insertions(+), 28 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index b98d161398..b4239fcc5e 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -1862,6 +1862,21 @@ firmware tables when using certain older guest Operating Sy
Re: [XEN v9 3/4] xen/arm64: io: Handle the abort due to access to stage1 translation table
Hi, On 08/03/2022 11:22, Ayan Kumar Halder wrote: Hi Julien, On 07/03/2022 23:59, Julien Grall wrote: Hi, On 07/03/2022 22:23, Ayan Kumar Halder wrote: On 07/03/2022 19:37, Julien Grall wrote: On 07/03/2022 14:27, Ayan Kumar Halder wrote: Hi Julien, Hi Ayan, Hi Julien, I need a bit of clarification to understand this. One clarification. On 04/03/2022 10:39, Julien Grall wrote: Hi Ayan, On 01/03/2022 12:40, Ayan Kumar Halder wrote: If the abort was caused due to access to stage1 translation table, Xen will assume that the stage1 translation table is in the non MMIO region. Reading this commit message again. I think you want to explain why we want to do that because, from my understanding, this is technically not forbidden by the Arm Arm. From the previous discussion, we want to do this because we can't easily handle such fault on emulated region (we have no away to the walker the value read). Sorry, Can you explain this a bit more ? Do you mean that if the page table is located in the emulated region, map_domain_page() (called from p2m_next_level()) will fail. For data abort with valid syndrome, you will have a register to write back the value read. When the data abort has s1ptw == 1, AFAICT, we have no information how to return the value. Do you mean that for s1ptw, we get an intermediate physical address ? if ( hpfar_is_valid(xabt.s1ptw, fsc) ) gpa = get_faulting_ipa(gva); No. That's not relevant here. We always need a IPA in order to deal with the fault. If the IPA corresponds to an emulated region, then Xen can read the emulated address, but can't return the value to the guest OS. That wouldn't be the guest OS. But the page-table walker in the CPU. Can Linux or any OS keep its page tables in the direct MMIO space ? If yes, then try_map_mmio() needs to be invoked to map the region, so that OS can access it. If not, then Xen needs to return abort because the OS may be behaving maliciously. I think what matters is whether the Arm Arm would or would not allow it. From what I can tell there are no such restriction. So we would need to be cautious to restrict it further than necessary. My understanding from previous discussion was that it does not make sense for linux or any OS to keep its page tables in any MMIO region (emulated or direct). Please correct me if mistaken. At the moment, none of the regions emulated by Xen could be used for page-tables. I am also not sure how we should handle such access if they arise. So it is more convenient to simply forbid them. Regarding direct MMIO, see above. Correct me if I am wrong, but it should not be a problem for Xen to deal with them. So while I agree this doesn't seem to make sense, the restriction seems unnecessary. So the behavior will be :- 1. If the stage1 translation table is in the non MMIO region or 'direct mapped' MMIO region, then invoke p2m_resolve_translation_fault() and try_map_mmio() to resolve the fault. If it succeeds, then return to the guest to retry. When 1. happens we don't know yet if the stage1 will be a non MMIO region or 'direct mapped'. Instead, we only know that the ISS syndrome is invalid. If it is a prefetch abort or the syndrome is invalid, then we should call check_p2m(). 2. If the previous step fails and for any other scenario (ie stage1 translation table is in emulated MMIO region or the address is invalid), return the abort to the guest. If check_p2m() didn't success and it is a fault on stage-1 walk, then we should send an abort. Cheers, -- Julien Grall
Re: [PATCH v4] vpci/msix: fix PBA accesses
On Tue, 2022-03-08 at 09:31 +0100, Jan Beulich wrote: > On 07.03.2022 17:37, Roger Pau Monne wrote: > > Map the PBA in order to access it from the MSI-X read and write > > handlers. Note that previously the handlers would pass the physical > > host address into the {read,write}{l,q} handlers, which is wrong as > > those expect a linear address. > > > > Map the PBA using ioremap when the first access is performed. Note > > that 32bit arches might want to abstract the call to ioremap into a > > vPCI arch handler, so they can use a fixmap range to map the PBA. > > > > Reported-by: Jan Beulich > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Jan Beulich > > > Cc: Alex Olson > > I'll wait a little with committing, in the hope for Alex to re-provide > a Tested-by. It works fine for me, you can add "Tested-by: alex.ol...@starlab.io" to the commit. > > > --- a/xen/drivers/vpci/msix.c > > +++ b/xen/drivers/vpci/msix.c > > @@ -182,6 +182,38 @@ static struct vpci_msix_entry *get_entry(struct > > vpci_msix *msix, > > return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE]; > > } > > > > +static void __iomem *get_pba(struct vpci *vpci) > > +{ > > +struct vpci_msix *msix = vpci->msix; > > +/* > > + * PBA will only be unmapped when the device is deassigned, so access > > it > > + * without holding the vpci lock. > > + */ > > +void __iomem *pba = read_atomic(&msix->pba); > > + > > +if ( likely(pba) ) > > +return pba; > > + > > +pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA), > > + vmsix_table_size(vpci, VPCI_MSIX_PBA)); > > +if ( !pba ) > > +return read_atomic(&msix->pba); > > + > > +spin_lock(&vpci->lock); > > +if ( !msix->pba ) > > +{ > > +write_atomic(&msix->pba, pba); > > +spin_unlock(&vpci->lock); > > +} > > +else > > +{ > > +spin_unlock(&vpci->lock); > > +iounmap(pba); > > +} > > TBH I had been hoping for just a single spin_unlock(), but you're > the maintainer of this code ... > > Jan > Thanks -Alex
Re: [PATCH 1/3] xen/arm: Add i.MX lpuart driver
On 28/02/2022 09:27, Peng Fan wrote: Hi Julien, Hi Peng, Subject: Re: [PATCH 1/3] xen/arm: Add i.MX lpuart driver Hi Peng, On 28/02/2022 01:07, Peng Fan (OSS) wrote: From: Peng Fan Can you give me a link to the specification and/or a similar driver in Linux? https://www.nxp.com/webapp/Download?colCode=IMX8QMIEC Chatper 13.6 Low Power Universal Asynchronous Receiver/ Transmitter (LPUART) But this requires registration to access. Ok. I think it would still be valuable to add the link of to the spec in the commit message. Linux driver: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tty/serial/fsl_lpuart.c I would also add a link to the Linux code just as a reference (if one doesn't want to register). Cheers, -- Julien Grall