Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On Mon, Oct 16, 2017 at 2:18 PM, Boris Ostrovsky wrote: > On 10/12/2017 03:53 PM, Boris Ostrovsky wrote: >> On 10/12/2017 03:27 PM, Andrew Cooper wrote: >>> On 12/10/17 20:11, Boris Ostrovsky wrote: >>>> There is also another problem: >>>> >>>> [1.312425] general protection fault: [#1] SMP >>>> [1.312901] Modules linked in: >>>> [1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6 >>>> [1.313878] task: 88003e2c task.stack: c938c000 >>>> [1.314360] RIP: 1e030:entry_SYSCALL_64_fastpath+0x1/0xa5 >>>> [1.314854] RSP: e02b:c938ff50 EFLAGS: 00010046 >>>> [1.315336] RAX: 000c RBX: 55f550168040 RCX: >>>> 7fcfc959f59a >>>> [1.315827] RDX: RSI: RDI: >>>> >>>> [1.316315] RBP: 000a R08: 037f R09: >>>> 0064 >>>> [1.316805] R10: 1f89cbf5 R11: 88003e2c R12: >>>> 7fcfc958ad60 >>>> [1.317300] R13: R14: 55f550185954 R15: >>>> 1000 >>>> [1.317801] FS: () GS:88003f80() >>>> knlGS: >>>> [1.318267] CS: e033 DS: ES: CR0: 80050033 >>>> [1.318750] CR2: 7fcfc97ab218 CR3: 3c88e000 CR4: >>>> 00042660 >>>> [1.319235] Call Trace: >>>> [1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48 >>>> 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00 >>>> 00 50 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5 >>>> [1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: >>>> c938ff50 >>>> [1.344255] ---[ end trace d7cb8cd6cd7c294c ]--- >>>> [1.345009] Kernel panic - not syncing: Attempted to kill init! >>>> exitcode=0x000b >>>> >>>> >>>> All code >>>> >>>>0:51 push %rcx >>>>1:50 push %rax >>>>2:57 push %rdi >>>>3:56 push %rsi >>>>4:52 push %rdx >>>>5:51 push %rcx >>>>6:6a dapushq $0xffda >>>>8:41 50push %r8 >>>>a:41 51push %r9 >>>>c:41 52push %r10 >>>>e:41 53push %r11 >>>> 10:48 83 ec 30 sub$0x30,%rsp >>>> 14:65 4c 8b 1c 25 c0 d2 mov%gs:0xd2c0,%r11 >>>> 1b:00 00 >>>> 1d:41 f7 03 df 39 08 90 testl $0x900839df,(%r11) >>>> 24:0f 85 a5 00 00 00jne0xcf >>>> 2a:50 push %rax >>>> 2b:*ff 15 9c 95 d0 ffcallq *-0x2f6a64(%rip)# >>>> 0xffd095cd<-- trapping instruction >>>> 31:58 pop%rax >>>> 32:48 3d 4c 01 00 00cmp$0x14c,%rax >>>> 38:77 0fja 0x49 >>>> 3a:4c 89 d1 mov%r10,%rcx >>>> 3d:ff .byte 0xff >>>> 3e:14 c5adc$0xc5,%al >>>> >>>> >>>> so the original 'cli' was replaced with the pv call but to me the offset >>>> looks a bit off, no? Shouldn't it always be positive? >>> callq takes a 32bit signed displacement, so jumping back by up to 2G is >>> perfectly legitimate. >> Yes, but >> >> ostr@workbase> nm vmlinux | grep entry_SYSCALL_64_fastpath >> 817365dd t entry_SYSCALL_64_fastpath >> ostr@workbase> nm vmlinux | grep " pv_irq_ops" >> 81c2dbc0 D pv_irq_ops >> ostr@workbase> >> >> so pv_irq_ops.irq_disable is about 5MB ahead of where we are now. (I >> didn't mean that x86 instruction set doesn't allow negative >> displacement, I was trying to say that pv_irq_ops always live further down) > > I believe the problem is this: > > #define PV_INDIRECT(addr) *addr(%rip) > > The displacement that the linker computes will be relative t
Re: [Xen-devel] [RFC 06/22] kvm: Adapt assembly for PIE support
On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier wrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. The new __ASM_GET_PTR_PRE macro is used to > get the address of a symbol on both 32 and 64-bit with PIE support. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > > Signed-off-by: Thomas Garnier > --- > arch/x86/include/asm/kvm_host.h | 6 -- > arch/x86/kernel/kvm.c | 6 -- > arch/x86/kvm/svm.c | 4 ++-- > 3 files changed, 10 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 87ac4fba6d8e..3041201a3aeb 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1352,9 +1352,11 @@ asmlinkage void kvm_spurious_fault(void); > ".pushsection .fixup, \"ax\" \n" \ > "667: \n\t" \ > cleanup_insn "\n\t" \ > - "cmpb $0, kvm_rebooting \n\t" \ > + "cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \ > "jne 668b \n\t" \ > - __ASM_SIZE(push) " $666b \n\t"\ > + __ASM_SIZE(push) "%%" _ASM_AX " \n\t" \ > + __ASM_GET_PTR_PRE(666b) "%%" _ASM_AX "\n\t" \ > + "xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t" \ > "call kvm_spurious_fault \n\t"\ > ".popsection \n\t" \ > _ASM_EXTABLE(666b, 667b) > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 71c17a5be983..53b8ad162589 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -618,8 +618,10 @@ asm( > ".global __raw_callee_save___kvm_vcpu_is_preempted;" > ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;" > "__raw_callee_save___kvm_vcpu_is_preempted:" > -"movq __per_cpu_offset(,%rdi,8), %rax;" > -"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);" > +"leaq __per_cpu_offset(%rip), %rax;" > +"movq (%rax,%rdi,8), %rax;" > +"addq " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;" This doesn't look right. It's accessing a per-cpu variable. The per-cpu section is an absolute, zero-based section and not subject to relocation. > +"cmpb $0, (%rax); > "setne %al;" > "ret;" > ".popsection"); -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support
On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier wrote: > Perpcu uses a clever design where the .percu ELF section has a virtual > address of zero and the relocation code avoid relocating specific > symbols. It makes the code simple and easily adaptable with or without > SMP support. > > This design is incompatible with PIE because generated code always try to > access the zero virtual address relative to the default mapping address. > It becomes impossible when KASLR is configured to go below -2G. This > patch solves this problem by removing the zero mapping and adapting the GS > base to be relative to the expected address. These changes are done only > when PIE is enabled. The original implementation is kept as-is > by default. The reason the per-cpu section is zero-based on x86-64 is to workaround GCC hardcoding the stack protector canary at %gs:40. So this patch is incompatible with CONFIG_STACK_PROTECTOR. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 21/22] x86/module: Add support for mcmodel large and PLTs
On Tue, Jul 18, 2017 at 9:35 PM, H. Peter Anvin wrote: > On 07/18/17 15:33, Thomas Garnier wrote: >> With PIE support and KASLR extended range, the modules may be further >> away from the kernel than before breaking mcmodel=kernel expectations. >> >> Add an option to build modules with mcmodel=large. The modules generated >> code will make no assumptions on placement in memory. >> >> Despite this option, modules still expect kernel functions to be within >> 2G and generate relative calls. To solve this issue, the PLT arm64 code >> was adapted for x86_64. When a relative relocation go outside its range, >> a dynamic PLT entry is used to correctly jump to the destination. > > Why large as opposed to medium or medium-PIC? Or for that matter, why not small-PIC? We aren't changing the size of the kernel to be larger than 2G text or data. Small-PIC would still allow it to be placed anywhere in the address space, and would generate far better code. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 21/22] x86/module: Add support for mcmodel large and PLTs
On Wed, Jul 19, 2017 at 11:58 AM, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 8:59 PM, Brian Gerst wrote: >> On Tue, Jul 18, 2017 at 9:35 PM, H. Peter Anvin wrote: >>> On 07/18/17 15:33, Thomas Garnier wrote: >>>> With PIE support and KASLR extended range, the modules may be further >>>> away from the kernel than before breaking mcmodel=kernel expectations. >>>> >>>> Add an option to build modules with mcmodel=large. The modules generated >>>> code will make no assumptions on placement in memory. >>>> >>>> Despite this option, modules still expect kernel functions to be within >>>> 2G and generate relative calls. To solve this issue, the PLT arm64 code >>>> was adapted for x86_64. When a relative relocation go outside its range, >>>> a dynamic PLT entry is used to correctly jump to the destination. >>> >>> Why large as opposed to medium or medium-PIC? >> >> Or for that matter, why not small-PIC? We aren't changing the size of >> the kernel to be larger than 2G text or data. Small-PIC would still >> allow it to be placed anywhere in the address space, and would >> generate far better code. > > My understanding was that small=PIC and medium=PIC assume that the > module code is in the lower 2G of memory. I will do additional testing > on the modules to confirm that. That is only for small/medium absolute (non-PIC) code. Think about userspace shared libraries. They are not limited to being mapped in the lower 2G of the address space. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries
; + jmp entry_SYSCALL_compat_after_hwframe > ENDPROC(xen_syscall32_target) > > /* 32-bit compat sysenter target */ > ENTRY(xen_sysenter_target) > - undo_xen_syscall > + mov 0*8(%rsp), %rcx > + mov 1*8(%rsp), %r11 > + mov 5*8(%rsp), %rsp > jmp entry_SYSENTER_compat > ENDPROC(xen_sysenter_target) This patch causes the iopl_32 and ioperm_32 self-tests to fail on a 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after "parent: write to 0x80 (should fail)", and the fault isn't caught by the signal handler. It just dumps back to the shell. The tests pass after reverting this. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries
On Mon, Aug 14, 2017 at 1:53 AM, Andy Lutomirski wrote: > On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst wrote: >> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski wrote: >>> /* Normal 64-bit system call target */ >>> ENTRY(xen_syscall_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_64_after_swapgs >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_64_after_hwframe >>> ENDPROC(xen_syscall_target) >>> >>> #ifdef CONFIG_IA32_EMULATION >>> >>> /* 32-bit compat syscall target */ >>> ENTRY(xen_syscall32_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_compat >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_compat_after_hwframe >>> ENDPROC(xen_syscall32_target) >>> >>> /* 32-bit compat sysenter target */ >>> ENTRY(xen_sysenter_target) >>> - undo_xen_syscall >>> + mov 0*8(%rsp), %rcx >>> + mov 1*8(%rsp), %r11 >>> + mov 5*8(%rsp), %rsp >>> jmp entry_SYSENTER_compat >>> ENDPROC(xen_sysenter_target) >> >> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a >> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after >> "parent: write to 0x80 (should fail)", and the fault isn't caught by >> the signal handler. It just dumps back to the shell. The tests pass >> after reverting this. > > I can reproduce it if I emulate an AMD machine. I can "fix" it like this: Yes, this is an AMD processor. > diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S > index a8a4f4c460a6..6255e00f425e 100644 > --- a/arch/x86/xen/xen-asm_64.S > +++ b/arch/x86/xen/xen-asm_64.S > @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target) > ENTRY(xen_syscall32_target) > popq %rcx > popq %r11 > + movq $__USER32_DS, 4*8(%rsp) > + movq $__USER32_CS, 1*8(%rsp) > + movq %r11, 2*8(%rsp) > jmp entry_SYSCALL_compat_after_hwframe > ENDPROC(xen_syscall32_target) > > but I haven't tried to diagnose precisely what's going on. > > Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't > to be a problem, but it kills opportunistic sysretl. Maybe that's > triggering a preexisting bug? Resetting the CS/SS values worked. Looking at the Xen hypervisor code, EFLAGS on the stack should already be set to the value in R11, so that part doesn't appear necessary. Shouldn't this also be done for the 64-bit SYSCALL entry, for consistency with previous code? -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
On Fri, Jul 7, 2017 at 9:38 AM, Tom Lendacky wrote: > Update the CPU features to include identifying and reporting on the > Secure Memory Encryption (SME) feature. SME is identified by CPUID > 0x801f, but requires BIOS support to enable it (set bit 23 of > MSR_K8_SYSCFG). Only show the SME feature as available if reported by > CPUID and enabled by BIOS. > > Reviewed-by: Borislav Petkov > Signed-off-by: Tom Lendacky > --- > arch/x86/include/asm/cpufeatures.h |1 + > arch/x86/include/asm/msr-index.h |2 ++ > arch/x86/kernel/cpu/amd.c | 13 + > arch/x86/kernel/cpu/scattered.c|1 + > 4 files changed, 17 insertions(+) > > diff --git a/arch/x86/include/asm/cpufeatures.h > b/arch/x86/include/asm/cpufeatures.h > index 2701e5f..2b692df 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -196,6 +196,7 @@ > > #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ > #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ > +#define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory > Encryption */ Given that this feature is available only in long mode, this should be added to disabled-features.h as disabled for 32-bit builds. > #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory > Number */ > #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ > diff --git a/arch/x86/include/asm/msr-index.h > b/arch/x86/include/asm/msr-index.h > index 18b1623..460ac01 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -352,6 +352,8 @@ > #define MSR_K8_TOP_MEM10xc001001a > #define MSR_K8_TOP_MEM20xc001001d > #define MSR_K8_SYSCFG 0xc0010010 > +#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT 23 > +#define MSR_K8_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT) > #define MSR_K8_INT_PENDING_MSG 0xc0010055 > /* C1E active bits in int pending message */ > #define K8_INTP_C1E_ACTIVE_MASK0x1800 > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index bb5abe8..c47ceee 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -611,6 +611,19 @@ static void early_init_amd(struct cpuinfo_x86 *c) > */ > if (cpu_has_amd_erratum(c, amd_erratum_400)) > set_cpu_bug(c, X86_BUG_AMD_E400); > + > + /* > +* BIOS support is required for SME. If BIOS has not enabled SME > +* then don't advertise the feature (set in scattered.c) > +*/ > + if (cpu_has(c, X86_FEATURE_SME)) { > + u64 msr; > + > + /* Check if SME is enabled */ > + rdmsrl(MSR_K8_SYSCFG, msr); > + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT)) > + clear_cpu_cap(c, X86_FEATURE_SME); > + } This should be conditional on CONFIG_X86_64. > } > > static void init_amd_k8(struct cpuinfo_x86 *c) > diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c > index 23c2350..05459ad 100644 > --- a/arch/x86/kernel/cpu/scattered.c > +++ b/arch/x86/kernel/cpu/scattered.c > @@ -31,6 +31,7 @@ struct cpuid_bit { > { X86_FEATURE_HW_PSTATE,CPUID_EDX, 7, 0x8007, 0 }, > { X86_FEATURE_CPB, CPUID_EDX, 9, 0x8007, 0 }, > { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 }, > + { X86_FEATURE_SME, CPUID_EAX, 0, 0x801f, 0 }, This should also be conditional. We don't want to set this feature on 32-bit, even if the processor has support. > { 0, 0, 0, 0, 0 } > }; -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()
On Fri, Jul 7, 2017 at 9:39 AM, Tom Lendacky wrote: > Currently there is a check if the address being mapped is in the ISA > range (is_ISA_range()), and if it is, then phys_to_virt() is used to > perform the mapping. When SME is active, the default is to add pagetable > mappings with the encryption bit set unless specifically overridden. The > resulting pagetable mapping from phys_to_virt() will result in a mapping > that has the encryption bit set. With SME, the use of ioremap() is > intended to generate pagetable mappings that do not have the encryption > bit set through the use of the PAGE_KERNEL_IO protection value. > > Rather than special case the SME scenario, remove the ISA range check and > usage of phys_to_virt() and have ISA range mappings continue through the > remaining ioremap() path. > > Signed-off-by: Tom Lendacky > --- > arch/x86/mm/ioremap.c |7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > index 4c1b5fd..bfc3e2d 100644 > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -106,12 +107,6 @@ static void __iomem *__ioremap_caller(resource_size_t > phys_addr, > } > > /* > -* Don't remap the low PCI/ISA area, it's always mapped.. > -*/ > - if (is_ISA_range(phys_addr, last_addr)) > - return (__force void __iomem *)phys_to_virt(phys_addr); > - > - /* > * Don't allow anybody to remap normal RAM that we're using.. > */ > pfn = phys_addr >> PAGE_SHIFT; > Removing this also affects 32-bit, which is more likely to access legacy devices in this range. Put in a check for SME instead (provided you follow my recommendations to not set the SME feature bit on 32-bit even when the processor supports it). -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()
On Mon, Jul 10, 2017 at 3:50 PM, Tom Lendacky wrote: > On 7/8/2017 7:57 AM, Brian Gerst wrote: >> >> On Fri, Jul 7, 2017 at 9:39 AM, Tom Lendacky >> wrote: >>> >>> Currently there is a check if the address being mapped is in the ISA >>> range (is_ISA_range()), and if it is, then phys_to_virt() is used to >>> perform the mapping. When SME is active, the default is to add pagetable >>> mappings with the encryption bit set unless specifically overridden. The >>> resulting pagetable mapping from phys_to_virt() will result in a mapping >>> that has the encryption bit set. With SME, the use of ioremap() is >>> intended to generate pagetable mappings that do not have the encryption >>> bit set through the use of the PAGE_KERNEL_IO protection value. >>> >>> Rather than special case the SME scenario, remove the ISA range check and >>> usage of phys_to_virt() and have ISA range mappings continue through the >>> remaining ioremap() path. >>> >>> Signed-off-by: Tom Lendacky >>> --- >>> arch/x86/mm/ioremap.c |7 +-- >>> 1 file changed, 1 insertion(+), 6 deletions(-) >>> >>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c >>> index 4c1b5fd..bfc3e2d 100644 >>> --- a/arch/x86/mm/ioremap.c >>> +++ b/arch/x86/mm/ioremap.c >>> @@ -13,6 +13,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include >>> #include >>> @@ -106,12 +107,6 @@ static void __iomem >>> *__ioremap_caller(resource_size_t phys_addr, >>> } >>> >>> /* >>> -* Don't remap the low PCI/ISA area, it's always mapped.. >>> -*/ >>> - if (is_ISA_range(phys_addr, last_addr)) >>> - return (__force void __iomem *)phys_to_virt(phys_addr); >>> - >>> - /* >>> * Don't allow anybody to remap normal RAM that we're using.. >>> */ >>> pfn = phys_addr >> PAGE_SHIFT; >>> >> >> Removing this also affects 32-bit, which is more likely to access >> legacy devices in this range. Put in a check for SME instead > > > I originally had a check for SME here in a previous version of the > patch. Thomas Gleixner recommended removing the check so that the code > path was always exercised regardless of the state of SME in order to > better detect issues: > > http://marc.info/?l=linux-kernel&m=149803067811436&w=2 > > Thanks, > Tom Looking a bit closer, this shortcut doesn't set the caching attributes. So it's probably best to get rid of it anyways. Also note, there is a corresponding check in iounmap(). -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
On Mon, Jul 10, 2017 at 3:41 PM, Tom Lendacky wrote: > On 7/8/2017 7:50 AM, Brian Gerst wrote: >> >> On Fri, Jul 7, 2017 at 9:38 AM, Tom Lendacky >> wrote: >>> >>> Update the CPU features to include identifying and reporting on the >>> Secure Memory Encryption (SME) feature. SME is identified by CPUID >>> 0x801f, but requires BIOS support to enable it (set bit 23 of >>> MSR_K8_SYSCFG). Only show the SME feature as available if reported by >>> CPUID and enabled by BIOS. >>> >>> Reviewed-by: Borislav Petkov >>> Signed-off-by: Tom Lendacky >>> --- >>> arch/x86/include/asm/cpufeatures.h |1 + >>> arch/x86/include/asm/msr-index.h |2 ++ >>> arch/x86/kernel/cpu/amd.c | 13 + >>> arch/x86/kernel/cpu/scattered.c|1 + >>> 4 files changed, 17 insertions(+) >>> >>> diff --git a/arch/x86/include/asm/cpufeatures.h >>> b/arch/x86/include/asm/cpufeatures.h >>> index 2701e5f..2b692df 100644 >>> --- a/arch/x86/include/asm/cpufeatures.h >>> +++ b/arch/x86/include/asm/cpufeatures.h >>> @@ -196,6 +196,7 @@ >>> >>> #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ >>> #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD >>> ProcFeedbackInterface */ >>> +#define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory >>> Encryption */ >> >> >> Given that this feature is available only in long mode, this should be >> added to disabled-features.h as disabled for 32-bit builds. > > > I can add that. If the series needs a re-spin then I'll include this > change in the series, otherwise I can send a follow-on patch to handle > the feature for 32-bit builds if that works. > > >> >>> #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory >>> Number */ >>> #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ >>> diff --git a/arch/x86/include/asm/msr-index.h >>> b/arch/x86/include/asm/msr-index.h >>> index 18b1623..460ac01 100644 >>> --- a/arch/x86/include/asm/msr-index.h >>> +++ b/arch/x86/include/asm/msr-index.h >>> @@ -352,6 +352,8 @@ >>> #define MSR_K8_TOP_MEM10xc001001a >>> #define MSR_K8_TOP_MEM20xc001001d >>> #define MSR_K8_SYSCFG 0xc0010010 >>> +#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT 23 >>> +#define MSR_K8_SYSCFG_MEM_ENCRYPT >>> BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT) >>> #define MSR_K8_INT_PENDING_MSG 0xc0010055 >>> /* C1E active bits in int pending message */ >>> #define K8_INTP_C1E_ACTIVE_MASK0x1800 >>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c >>> index bb5abe8..c47ceee 100644 >>> --- a/arch/x86/kernel/cpu/amd.c >>> +++ b/arch/x86/kernel/cpu/amd.c >>> @@ -611,6 +611,19 @@ static void early_init_amd(struct cpuinfo_x86 *c) >>> */ >>> if (cpu_has_amd_erratum(c, amd_erratum_400)) >>> set_cpu_bug(c, X86_BUG_AMD_E400); >>> + >>> + /* >>> +* BIOS support is required for SME. If BIOS has not enabled SME >>> +* then don't advertise the feature (set in scattered.c) >>> +*/ >>> + if (cpu_has(c, X86_FEATURE_SME)) { >>> + u64 msr; >>> + >>> + /* Check if SME is enabled */ >>> + rdmsrl(MSR_K8_SYSCFG, msr); >>> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT)) >>> + clear_cpu_cap(c, X86_FEATURE_SME); >>> + } >> >> >> This should be conditional on CONFIG_X86_64. > > > If I make the scattered feature support conditional on CONFIG_X86_64 > (based on comment below) then cpu_has() will always be false unless > CONFIG_X86_64 is enabled. So this won't need to be wrapped by the > #ifdef. If you change it to use cpu_feature_enabled(), gcc will see that it is disabled and eliminate the dead code at compile time. >> >>> } >>> >>> static void init_amd_k8(struct cpuinfo_x86 *c) >>> diff --git a/arch/x86/kernel/cpu/scattered.c >>> b/arch/x86/kernel/cpu/scattered.c >>> index 23c2350..05459ad 100644 >>> --- a/arch/x86/kernel/cpu/scattered.c >>> +++ b/arch/x86/kernel/cpu/scattered.c >>> @@ -31,6 +31,7 @@ struct cpuid_bit { >>> { X86_FEATURE_HW_PSTATE,CPUID_EDX, 7, 0x8007, 0 }, >>> { X86_FEATURE_CPB, CPUID_EDX, 9, 0x8007, 0 }, >>> { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 }, >>> + { X86_FEATURE_SME, CPUID_EAX, 0, 0x801f, 0 }, >> >> >> This should also be conditional. We don't want to set this feature on >> 32-bit, even if the processor has support. > > > Can do. See comment above about re-spin vs. follow-on patch. > > Thanks, > Tom A followup patch will be OK if there is no code that will get confused by the SME bit being present but not active. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()
On Tue, Jul 11, 2017 at 4:35 AM, Arnd Bergmann wrote: > On Tue, Jul 11, 2017 at 6:58 AM, Brian Gerst wrote: >> On Mon, Jul 10, 2017 at 3:50 PM, Tom Lendacky >> wrote: >>> On 7/8/2017 7:57 AM, Brian Gerst wrote: >>>> On Fri, Jul 7, 2017 at 9:39 AM, Tom Lendacky >>> >>> I originally had a check for SME here in a previous version of the >>> patch. Thomas Gleixner recommended removing the check so that the code >>> path was always exercised regardless of the state of SME in order to >>> better detect issues: >>> >>> http://marc.info/?l=linux-kernel&m=149803067811436&w=2 >> >> Looking a bit closer, this shortcut doesn't set the caching >> attributes. So it's probably best to get rid of it anyways. Also >> note, there is a corresponding check in iounmap(). Perhaps the iounmap() check should be kept for now for safety, since some drivers (vga16fb for example) call iounmap() blindly even if the mapping wasn't returned from ioremap(). Maybe add a warning when an ISA address is passed to iounmap(). > Could that cause regressions if a driver relies on (write-through) > cacheable access to the VGA frame buffer RAM or an read-only > cached access to an option ROM but now gets uncached access? Yes there could be some surprises in drivers use the normal ioremap() call which is uncached but were expecting the default write-through mapping. > I also tried to find out whether we can stop mapping the ISA MMIO > area into the linear mapping, but at least the VGA code uses > VGA_MAP_MEM() to get access to the same pointers. I'm pretty > sure this got copied incorrectly into most other architectures, but > it is definitely still used on x86 with vga16fb/vgacon/mdacon. Changing VGA_MAP_MEM() to use ioremap_wt() would take care of that. Although, looking at the mdacon/vgacon, they don't have support for unmapping the frame buffer if they are built modular. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 07/38] x86/mm: Remove phys_to_virt() usage in ioremap()
On Tue, Jul 11, 2017 at 11:02 AM, Tom Lendacky wrote: > On 7/10/2017 11:58 PM, Brian Gerst wrote: >> >> On Mon, Jul 10, 2017 at 3:50 PM, Tom Lendacky >> wrote: >>> >>> On 7/8/2017 7:57 AM, Brian Gerst wrote: >>>> >>>> >>>> On Fri, Jul 7, 2017 at 9:39 AM, Tom Lendacky >>>> wrote: >>>>> >>>>> >>>>> Currently there is a check if the address being mapped is in the ISA >>>>> range (is_ISA_range()), and if it is, then phys_to_virt() is used to >>>>> perform the mapping. When SME is active, the default is to add >>>>> pagetable >>>>> mappings with the encryption bit set unless specifically overridden. >>>>> The >>>>> resulting pagetable mapping from phys_to_virt() will result in a >>>>> mapping >>>>> that has the encryption bit set. With SME, the use of ioremap() is >>>>> intended to generate pagetable mappings that do not have the encryption >>>>> bit set through the use of the PAGE_KERNEL_IO protection value. >>>>> >>>>> Rather than special case the SME scenario, remove the ISA range check >>>>> and >>>>> usage of phys_to_virt() and have ISA range mappings continue through >>>>> the >>>>> remaining ioremap() path. >>>>> >>>>> Signed-off-by: Tom Lendacky >>>>> --- >>>>>arch/x86/mm/ioremap.c |7 +-- >>>>>1 file changed, 1 insertion(+), 6 deletions(-) >>>>> >>>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c >>>>> index 4c1b5fd..bfc3e2d 100644 >>>>> --- a/arch/x86/mm/ioremap.c >>>>> +++ b/arch/x86/mm/ioremap.c >>>>> @@ -13,6 +13,7 @@ >>>>>#include >>>>>#include >>>>>#include >>>>> +#include >>>>> >>>>>#include >>>>>#include >>>>> @@ -106,12 +107,6 @@ static void __iomem >>>>> *__ioremap_caller(resource_size_t phys_addr, >>>>> } >>>>> >>>>> /* >>>>> -* Don't remap the low PCI/ISA area, it's always mapped.. >>>>> -*/ >>>>> - if (is_ISA_range(phys_addr, last_addr)) >>>>> - return (__force void __iomem *)phys_to_virt(phys_addr); >>>>> - >>>>> - /* >>>>>* Don't allow anybody to remap normal RAM that we're using.. >>>>>*/ >>>>> pfn = phys_addr >> PAGE_SHIFT; >>>>> >>>> >>>> Removing this also affects 32-bit, which is more likely to access >>>> legacy devices in this range. Put in a check for SME instead >>> >>> >>> >>> I originally had a check for SME here in a previous version of the >>> patch. Thomas Gleixner recommended removing the check so that the code >>> path was always exercised regardless of the state of SME in order to >>> better detect issues: >>> >>> http://marc.info/?l=linux-kernel&m=149803067811436&w=2 >>> >>> Thanks, >>> Tom >> >> >> Looking a bit closer, this shortcut doesn't set the caching >> attributes. So it's probably best to get rid of it anyways. Also >> note, there is a corresponding check in iounmap(). > > > Good catch. I'll update the patch to include the removal of the ISA > checks in the iounmap() path as well. I now think it should be kept but also emit a warning, at least for the short term. There is bad code out there (vga16fb for example) that calls iounmap() blindly without calling ioremap() first. We don't want to actually follow through with the unmap on the linear mapping. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/xen: Avoid fast syscall path for Xen PV guests
On Wed, Nov 18, 2015 at 3:21 PM, Andy Lutomirski wrote: > On Wed, Nov 18, 2015 at 12:06 PM, Boris Ostrovsky > wrote: >> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >> it's not pt_regs). >> >> Since we end up calling xen_iret from xen_sysexit we don't need to fix >> up the stack and instead follow entry_SYSENTER_32's IRET path directly >> to xen_iret. >> >> We can do the same thing for compat mode even though stack does not need >> to be fixed. This will allow us to drop usergs_sysret32 paravirt op (in >> the subsequent patch) > > Looks generally quite nice. Minor comments below: > >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -308,7 +308,8 @@ sysenter_past_esp: >> >> movl%esp, %eax >> calldo_fast_syscall_32 >> - testl %eax, %eax >> + /* XEN PV guests always use IRET path */ >> + ALTERNATIVE "testl %eax, %eax", "xor %eax, %eax", X86_FEATURE_XENPV >> jz .Lsyscall_32_done > > Could we make this a little less subtle: > > ALTERNATIVE "testl %eax, %eax; lz .Lsyscall_32_done", "jmp > .Lsyscasll_32_done", X86_FEATURE_XENPV > > Borislav, what do you think? > > Ditto for the others. Can you just add !xen_pv_domain() to the opportunistic SYSRET check instead? Bury the alternatives in that macro, ie. static_cpu_has(X86_FEATURE_XENPV). That would likely benefit other code as well. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/x86: Zero out .bss for PV guests
On Wed, Feb 24, 2016 at 10:19 AM, Boris Ostrovsky wrote: > Baremetal kernels clear .bss early in the boot but Xen PV guests don't > execute that code. They have been able to run without problems because > Xen domain builder happens to give out zeroed pages. However, since this > is not really guaranteed, .bss should be explicitly cleared. > > (Since we introduce macros for specifying 32- and 64-bit registers we > can get rid of ifdefs in startup_xen()) > > Signed-off-by: Boris Ostrovsky > Cc: sta...@vger.kernel.org > --- > arch/x86/xen/xen-head.S | 29 ++--- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S > index b65f59a..2af87d1 100644 > --- a/arch/x86/xen/xen-head.S > +++ b/arch/x86/xen/xen-head.S > @@ -35,16 +35,31 @@ > #define PVH_FEATURES (0) > #endif > > - __INIT > -ENTRY(startup_xen) > - cld > #ifdef CONFIG_X86_32 > - mov %esi,xen_start_info > - mov $init_thread_union+THREAD_SIZE,%esp > +#define REG(register) %e##register > +#define WSIZE_SHIFT2 > +#define STOS stosl > #else > - mov %rsi,xen_start_info > - mov $init_thread_union+THREAD_SIZE,%rsp > +#define REG(register) %r##register > +#define WSIZE_SHIFT3 > +#define STOS stosq > #endif > + > + __INIT > +ENTRY(startup_xen) > + cld > + > + /* Clear .bss */ > + xor REG(ax),REG(ax) > + mov $__bss_start,REG(di) > + mov $__bss_stop,REG(cx) > + sub REG(di),REG(cx) > + shr $WSIZE_SHIFT,REG(cx) > + rep STOS > + > + mov REG(si),xen_start_info > + mov $init_thread_union+THREAD_SIZE,REG(sp) > + > jmp xen_start_kernel > > __FINIT Use the macros in instead of defining your own. Also, xorl %eax,%eax is good for 64-bit too, since the upper bits are cleared. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] xen/x86: Zero out .bss for PV guests
On Thu, Feb 25, 2016 at 10:16 AM, Boris Ostrovsky wrote: > Baremetal kernels clear .bss early in the boot but Xen PV guests don't > execute that code. They have been able to run without problems because > Xen domain builder happens to give out zeroed pages. However, since this > is not really guaranteed, .bss should be explicitly cleared. > > Signed-off-by: Boris Ostrovsky > Cc: sta...@vger.kernel.org > --- > arch/x86/xen/xen-head.S | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S > index b65f59a..11dbe49 100644 > --- a/arch/x86/xen/xen-head.S > +++ b/arch/x86/xen/xen-head.S > @@ -38,6 +38,20 @@ > __INIT > ENTRY(startup_xen) > cld > + > +#ifdef CONFIG_X86_32 > +#define WSIZE_SHIFT2 > +#else > +#define WSIZE_SHIFT3 > +#endif > + /* Clear .bss */ > + xor %eax,%eax > + mov $__bss_start, %__ASM_REG(di) > + mov $__bss_stop, %__ASM_REG(cx) > + sub %__ASM_REG(di), %__ASM_REG(cx) > + shr $WSIZE_SHIFT, %__ASM_REG(cx) > + rep __ASM_SIZE(stos) > + > #ifdef CONFIG_X86_32 > mov %esi,xen_start_info > mov $init_thread_union+THREAD_SIZE,%esp Better, but can still be improved. Replace WSIZE_SHIFT with __ASM_SEL(2, 3), and use the macros for the registers (ie. __ASM_DI). -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/2] Clear .bss for VP guests
On Fri, Feb 26, 2016 at 8:51 AM, Boris Ostrovsky wrote: > On 02/26/2016 05:53 AM, Roger Pau Monné wrote: >> >> El 25/2/16 a les 16:16, Boris Ostrovsky ha escrit: >>> >>> PV guests need to have their .bss zeroed out since it is not guaranteed >>> to be cleared by Xen's domain builder >> >> I guess I'm missing something, but elf_load_image (in libelf-loader.c) >> seems to be able to clear segments (it will zero the memory between >> p_paddr + p_filesz and p_paddr + p_memsz) while loading the ELF into >> memory, so if the program headers are correctly setup the .bss should be >> zeroed out AFAICT. > > > Right, but I don't think this is guaranteed. It's uninitialized data so in > principle it can be anything. > > The ELF spec says "the system initializes the data with zero when the > program begins to run" which I read as it's up to runtime and not the loader > to do so. > > And since kernel does it explicitly on baremetal path I think it's a good > idea for PV to do the same. It does it on bare metal because bzImage is a raw binary image, not ELF. -- Brian Gerst ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel