Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()

2025-04-09 Thread Dave Hansen
On 4/9/25 12:53, Ingo Molnar wrote: >>> What would folks think about "wrmsr64()"? It's writing a 64-bit >>> value to an MSR and there are a lot of functions in the kernel that >>> are named with the argument width in bits. >> Personally, I hate the extra verbosity, mostly visual, since numerals

Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()

2025-04-09 Thread Dave Hansen
On 4/9/25 12:53, Ingo Molnar wrote: >>> What would folks think about "wrmsr64()"? It's writing a 64-bit >>> value to an MSR and there are a lot of functions in the kernel that >>> are named with the argument width in bits. >> Personally, I hate the extra verbosity, mostly visual, since numerals

Re: [PATCH] selftests/sgx: Fix an enclave built with extended instructions

2025-04-09 Thread Dave Hansen
On 4/9/25 09:55, Vladis Dronov wrote: ... > Fix this by adding "-mno-avx" to ENCL_CFLAGS in Makefile. Add some comments > about this to code locations where enclave's xfrm field is set. > > Suggested-by: Dave Hansen > Signed-off-by: Vladis Dronov First of all, th

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
On 4/8/25 10:40, Matthew Wilcox wrote: > I think at this point in Kevin's series, we don't call the ctor for > these pages, so we never set PageTable() on them. I could be wrong; > as Kevin says, this is all very twisty and confusing with exceptions and > exceptions to exceptions. This series sho

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
On 4/8/25 10:40, Matthew Wilcox wrote: > I think at this point in Kevin's series, we don't call the ctor for > these pages, so we never set PageTable() on them. I could be wrong; > as Kevin says, this is all very twisty and confusing with exceptions and > exceptions to exceptions. This series sho

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
On 4/8/25 09:37, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: >> Are there any tests for folio_test_pgtable() at free_page() time? If we >> had that, it would make it less likely that another free_page() user >> could sneak in without c

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
On 4/8/25 09:37, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 08:22:47AM -0700, Dave Hansen wrote: >> Are there any tests for folio_test_pgtable() at free_page() time? If we >> had that, it would make it less likely that another free_page() user >> could sneak in without c

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
ne and adding consistency is nice. Are there any tests for folio_test_pgtable() at free_page() time? If we had that, it would make it less likely that another free_page() user could sneak in without calling the destructor. Acked-by: Dave Hansen

Re: [PATCH v2 02/12] x86: pgtable: Always use pte_free_kernel()

2025-04-08 Thread Dave Hansen
ne and adding consistency is nice. Are there any tests for folio_test_pgtable() at free_page() time? If we had that, it would make it less likely that another free_page() user could sneak in without calling the destructor. Acked-by: Dave Hansen

Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()

2025-04-04 Thread Dave Hansen
On 3/31/25 22:53, Xin Li wrote: > Per "struct msr" defined in arch/x86/include/asm/shared/msr.h: > > struct msr { >     union { >     struct { >     u32 l; >     u32 h; >     }; >     u64 q; >     }; > }; > > Prob

Re: [PATCH v2] x86/xen: fix balloon target initialization for PVH dom0

2025-04-04 Thread Dave Hansen
On 4/4/25 06:34, Roger Pau Monne wrote: > Much like a6aa4eb994ee, the code in this changeset should have been part of > 38620fc4e893. > > Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if > available') > Signed-off-by: Roger Pau Monné I don't see a cc:stable@ on there. Was

Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()

2025-04-02 Thread Dave Hansen
On 3/31/25 22:53, Xin Li wrote: > Per "struct msr" defined in arch/x86/include/asm/shared/msr.h: > > struct msr { >     union { >     struct { >     u32 l; >     u32 h; >     }; >     u64 q; >     }; > }; > > Prob

Re: [PATCH 00/13] arch, mm: reduce code duplication in mem_init()

2025-03-15 Thread Dave Hansen
On 3/6/25 10:51, Mike Rapoport wrote: > 53 files changed, 151 insertions(+), 618 deletions(-) > delete mode 100644 arch/x86/include/asm/numa_32.h > delete mode 100644 arch/x86/mm/highmem_32.c Holy cow, nice work. For the x86 bits: Acked-by: Dave Hansen

Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses

2025-03-12 Thread Dave Hansen
On 3/4/25 11:16, Dave Hansen wrote: > On 3/4/25 10:49, Eric W. Biederman wrote: >> How goes the work to fix this horrifically slow firmware interface? > The firmware interface isn't actually all that slow. Hey Eric, I've noticed a trend on this series. It seems like e

[PATCH] Avoid accidental use of BayesStore::SQL with MySQL or Postgres

2025-03-07 Thread Dave Hansen
The longer story is in here: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8315 But the short version is: I have a spamassassin config that is a bit old. I'd say I originally wrote it in ~2005. It's been happily using: bayes_store_module Mail::SpamAssassin::BayesStore::SQL

Re: [PATCH 00/13] arch, mm: reduce code duplication in mem_init()

2025-03-06 Thread Dave Hansen
On 3/6/25 10:51, Mike Rapoport wrote: > 53 files changed, 151 insertions(+), 618 deletions(-) > delete mode 100644 arch/x86/include/asm/numa_32.h > delete mode 100644 arch/x86/mm/highmem_32.c Holy cow, nice work. For the x86 bits: Acked-by: Dave Hansen

Re: [PATCH 00/13] arch, mm: reduce code duplication in mem_init()

2025-03-06 Thread Dave Hansen
On 3/6/25 10:51, Mike Rapoport wrote: > 53 files changed, 151 insertions(+), 618 deletions(-) > delete mode 100644 arch/x86/include/asm/numa_32.h > delete mode 100644 arch/x86/mm/highmem_32.c Holy cow, nice work. For the x86 bits: Acked-by: Da

Re: [PATCH 00/13] arch, mm: reduce code duplication in mem_init()

2025-03-06 Thread Dave Hansen
On 3/6/25 10:51, Mike Rapoport wrote: > 53 files changed, 151 insertions(+), 618 deletions(-) > delete mode 100644 arch/x86/include/asm/numa_32.h > delete mode 100644 arch/x86/mm/highmem_32.c Holy cow, nice work. For the x86 bits: Acked-by: Dave Hansen

Re: [PATCH v2] arch/x86: Fix size overflows in sgx_encl_create()

2025-03-04 Thread Dave Hansen
On 3/4/25 16:19, Jarkko Sakkinen wrote: > On Tue, Mar 04, 2025 at 04:18:03PM -0800, Dave Hansen wrote: >> On 3/4/25 16:06, Jarkko Sakkinen wrote: >>> + /* >>> +* This is a micro-architectural requirement. ECREATE would detect this >>> +* too without

Re: [PATCH v2] arch/x86: Fix size overflows in sgx_encl_create()

2025-03-04 Thread Dave Hansen
On 3/4/25 16:06, Jarkko Sakkinen wrote: > + /* > + * This is a micro-architectural requirement. ECREATE would detect this > + * too without mentionable overhead but this check guarantees also that > + * the space calculations for EPC and shmem allocations never overflow. > +

Re: [PATCH] arch/x86: Fix size overflows in sgx_encl_create()

2025-03-04 Thread Dave Hansen
On 3/4/25 14:56, Jarkko Sakkinen wrote: > The total size calculated for EPC can overflow u64 given the added up page > for SECS. Further, the total size calculated for shmem can overflow even > when the EPC size stays within limits of u64, given that it adds the extra > space for 128 byte PCMD str

Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses

2025-03-04 Thread Dave Hansen
On 3/4/25 10:49, Eric W. Biederman wrote: > How goes the work to fix this horrifically slow firmware interface? The firmware interface isn't actually all that slow. The fundamental requirement is that confidential computing environments need to be handed memory in a known-benign state. For AMD SE

Re: [PATCH v6 1/7] mseal, system mappings: kernel config and header change

2025-02-24 Thread Dave Hansen
On 2/24/25 10:55, Kees Cook wrote: >> That logic is reasonable. But it's different from the _vast_ majority of >> other flags. >> >> So what justifies VM_SEALED being so different? It's leading to pretty >> objectively ugly code in this series. > Note that VM_SEALED is the "is this VMA sealed?" bit

Re: [PATCH v6 1/7] mseal, system mappings: kernel config and header change

2025-02-24 Thread Dave Hansen
On 2/24/25 10:44, Jeff Xu wrote: > For example: > Consider the case below in src/third_party/kernel/v6.6/fs/proc/task_mmu.c, > > #ifdef CONFIG_64BIT > [ilog2(VM_SEALED)] = "sl", > #endif > > Redefining VM_SEALED to VM_NONE for 32 bit won't detect the problem > in case that "#ifdef CONFIG_64BIT"

Re: [PATCH v6 1/7] mseal, system mappings: kernel config and header change

2025-02-24 Thread Dave Hansen
On 2/24/25 09:45, jef...@chromium.org wrote: > +/* > + * mseal of userspace process's system mappings. > + */ > +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS > +#define MSEAL_SYSTEM_MAPPINGS_VM_FLAGVM_SEALED > +#else > +#define MSEAL_SYSTEM_MAPPINGS_VM_FLAGVM_NONE > +#endif This ends up loo

Re: [PATCH V5 2/4] x86/tdx: Route safe halt execution via tdx_safe_halt()

2025-02-20 Thread Dave Hansen
On 2/20/25 13:16, Vishal Annapurve wrote: > Direct HLT instruction execution causes #VEs for TDX VMs which is routed > to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow > so IRQs need to remain disabled until the TDCALL to ensure that pending > IRQs are correctly treated as w

Re: [PATCH V5 1/4] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT

2025-02-20 Thread Dave Hansen
On 2/20/25 13:16, Vishal Annapurve wrote: > Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest > like platforms, move HLT and SAFE_HLT paravirt calls under > CONFIG_PARAVIRT. I guess it's just one patch, but doesn't this expose CONFIG_PARAVIRT=y users to what _was_ specific to CONFIG_

Re: [PATCH] perf/x86/rapl: Fix PP1 event for Intel Meteor/Lunar Lake

2025-02-20 Thread Dave Hansen
On 2/20/25 10:27, Lucas De Marchi wrote: > On Thu, Feb 20, 2025 at 08:28:01AM -0800, Dave Hansen wrote: >> On 2/20/25 07:36, Lucas De Marchi wrote: >>> On some boots the read of MSR_PP1_ENERGY_STATUS msr returns 0, causing >>> perf_msr_probe() to make the power/events/e

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-20 Thread Dave Hansen
On 2/20/25 09:10, Valentin Schneider wrote: >> The LDT and maybe the PEBS buffers are the only implicit supervisor >> accesses to vmalloc()'d memory that I can think of. But those are both >> handled specially and shouldn't ever get zapped while in use. The LDT >> replacement has its own IPIs separ

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-20 Thread Dave Hansen
On 2/20/25 09:10, Valentin Schneider wrote: >> The LDT and maybe the PEBS buffers are the only implicit supervisor >> accesses to vmalloc()'d memory that I can think of. But those are both >> handled specially and shouldn't ever get zapped while in use. The LDT >> replacement has its own IPIs separ

Re: [PATCH] perf/x86/rapl: Fix PP1 event for Intel Meteor/Lunar Lake

2025-02-20 Thread Dave Hansen
On 2/20/25 07:36, Lucas De Marchi wrote: > On some boots the read of MSR_PP1_ENERGY_STATUS msr returns 0, causing > perf_msr_probe() to make the power/events/energy-gpu event non-visible. > When that happens, the msr always read 0 until the graphics module (i915 > for Meteor Lake, xe for Lunar Lake

Re: [PATCH v2 1/1] kexec_core: Accept unaccepted kexec segments' destination addresses

2025-02-19 Thread Dave Hansen
we've got at least one end user[1] that seems to think unaccepted memory fits their needs. This bug can _probably_ be fixed in arch/x86 as well, but having the solution in general code seems like the right place to me: Acked-by: Dave Hansen Andrew, it seems like a lot of kexec work flow

Re: [PATCH 4/7] x86: Remove custom definition of mk_pte()

2025-02-19 Thread Dave Hansen
al(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == > + _PAGE_DIRTY); Looks sane to me. Good riddance to unnecessary arch-specific code. Acked-by: Dave Hansen Just one note (in case anyone ever trips over that WARN_ON_ONCE()): This is a problem with the existing code and with y

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 07:13, Valentin Schneider wrote: >> Maybe I missed part of the discussion though. Is VMEMMAP your only >> concern? I would have guessed that the more generic vmalloc() >> functionality would be harder to pin down. > Urgh, that'll teach me to send emails that late - I did indeed mean the

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 07:13, Valentin Schneider wrote: >> Maybe I missed part of the discussion though. Is VMEMMAP your only >> concern? I would have guessed that the more generic vmalloc() >> functionality would be harder to pin down. > Urgh, that'll teach me to send emails that late - I did indeed mean the

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 09:08, Joel Fernandes wrote: >> Pretty much so yeah. That is, *if* there such a vmalloc'd address access in >> early entry code - testing says it's not the case, but I haven't found a >> way to instrumentally verify this. > Ok, thanks for confirming. Maybe there is an address sanitizer w

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-19 Thread Dave Hansen
On 2/19/25 09:08, Joel Fernandes wrote: >> Pretty much so yeah. That is, *if* there such a vmalloc'd address access in >> early entry code - testing says it's not the case, but I haven't found a >> way to instrumentally verify this. > Ok, thanks for confirming. Maybe there is an address sanitizer w

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-18 Thread Dave Hansen
On 2/18/25 14:40, Valentin Schneider wrote: >> In practice, it's mostly limited like that. >> >> Architecturally, there are no promises from the CPU. It is within its >> rights to cache anything from the page tables at any time. If it's in >> the CR3 tree, it's fair game. >> > So what if the VMEMMA

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-18 Thread Dave Hansen
On 2/18/25 14:40, Valentin Schneider wrote: >> In practice, it's mostly limited like that. >> >> Architecturally, there are no promises from the CPU. It is within its >> rights to cache anything from the page tables at any time. If it's in >> the CR3 tree, it's fair game. >> > So what if the VMEMMA

Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses

2025-02-14 Thread Dave Hansen
On 2/14/25 05:46, Kirill A. Shutemov wrote: >> It sounds like you're advocating for the "slow guest boot" option. >> Kirill, can you remind us how fast a guest boots to the shell for >> modestly-sized (say 256GB) memory with "accept_memory=eager" versus >> "accept_memory=lazy"? IIRC, it was a prett

Re: [PATCH v2 1/1] kexec_core: Accept unaccepted kexec segments' destination addresses

2025-02-13 Thread Dave Hansen
On 12/13/24 01:54, Yan Zhao wrote: > + /* > + * The destination addresses are searched from system RAM rather than > + * being allocated from the buddy allocator, so they are not guaranteed > + * to be accepted by the current kernel. Accept the destination > + * addresses b

Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses

2025-02-13 Thread Dave Hansen
On 1/13/25 06:59, Eric W. Biederman wrote: ... > I have a new objection. I believe ``unaccepted memory'' and especially > lazily initialized ``unaccepted memory'' is an information leak that > could defeat the purpose of encrypted memory. For that reason I have > Cc'd the security list. I don't

Re: [PATCH v2 6/6] selftests/mm: remove local __NR_* definitions

2025-02-13 Thread Dave Hansen
On 2/13/25 00:04, John Hubbard wrote: > > 2) I'm unable to reproduce what you saw, because in ALL cases (before > or after the commit, and with or without a revert), I get the same > results on my Intel test machine: > >     $ ./protection_keys_64 >     has pkeys: 0 >     running PKEY tests for u

Re: [PATCH v2 6/6] selftests/mm: remove local __NR_* definitions

2025-02-12 Thread Dave Hansen
Hi John, On 6/13/24 19:30, John Hubbard wrote: > --- a/tools/testing/selftests/mm/protection_keys.c > +++ b/tools/testing/selftests/mm/protection_keys.c > @@ -42,7 +42,7 @@ > #include > #include > #include > -#include > +#include > #include > #include I'm not quite sure how but this b

Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming

2025-02-11 Thread Dave Hansen
On 2/11/25 16:32, andrzej zaborowski wrote: >> Actually, now that I think about it even more, why would ETRACK or >> EBLOCK access the page itself? They seem superficially like they'd be >> metadata-only too. > I haven't seen a crash in either of these (always in EWB), I didn't > want to imply that

Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming

2025-02-11 Thread Dave Hansen
On 2/11/25 13:18, Huang, Kai wrote: >>> This requires low-level SGX implementation knowledge to fully >>> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, >>> how they are involved in reclaim and also why EREMOVE doesn't lead to >>> the same fate. >> >> Does it? [I'll dig up I

Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming

2025-02-11 Thread Dave Hansen
I don't expect everyone to know the rules of every little part of the kernel. But, it's really easy to see a pattern with: git log arch/x86/kernel/cpu/sgx/ That usually works for every little nook and cranny of the kernel and will show you what the subject rules are. Could you do that fo

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-11 Thread Dave Hansen
On 2/11/25 05:33, Valentin Schneider wrote: >> 2. It's wrong to assume that TLB entries are only populated for >> addresses you access - thanks to speculative execution, you have to >> assume that the CPU might be populating random TLB entries all over >> the place. > Gotta love speculation. Now it

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

2025-02-11 Thread Dave Hansen
On 2/11/25 05:33, Valentin Schneider wrote: >> 2. It's wrong to assume that TLB entries are only populated for >> addresses you access - thanks to speculative execution, you have to >> assume that the CPU might be populating random TLB entries all over >> the place. > Gotta love speculation. Now it

Re: [PATCH 06/10] x86/tdx: Mark message.str as nonstring

2025-02-06 Thread Dave Hansen
oes seem like we should probably not be using 'char' and also shouldn't call it anything close to "string". Maybe: u8 message[64] __nonstring; In any case, feel free to carry the annotation in your tree: Acked-by: Dave Hansen

Re: [PATCH 0/7] Move prefaulting into write slow paths

2025-01-30 Thread Dave Hansen
On 1/30/25 16:56, Kent Overstreet wrote: > On Thu, Jan 30, 2025 at 08:04:49AM -0800, Dave Hansen wrote:... >> Any suggestions for fully describing the situation? I tried to sprinkle >> comments liberally but I'm also painfully aware that I'm not doing a >> perfect j

Re: [PATCH 0/7] Move prefaulting into write slow paths

2025-01-30 Thread Dave Hansen
On 1/29/25 23:44, Kent Overstreet wrote: > On Wed, Jan 29, 2025 at 10:17:49AM -0800, Dave Hansen wrote: >> tl;dr: The VFS and several filesystems have some suspect prefaulting >> code. It is unnecessarily slow for the common case where a write's >> source buffer is reside

[PATCH 5/7] bcachefs: Move prefaulting out of hot write path

2025-01-29 Thread Dave Hansen
From: Dave Hansen Prefaulting the write source buffer incurs an extra userspace access in the common fast path. Make bch2_buffered_write() consistent with generic_perform_write(): only touch userspace an extra time when copy_page_from_iter_atomic() has failed to make progress. This also zaps

[PATCH 0/7] Move prefaulting into write slow paths

2025-01-29 Thread Dave Hansen
tl;dr: The VFS and several filesystems have some suspect prefaulting code. It is unnecessarily slow for the common case where a write's source buffer is resident and does not need to be faulted in. Move these "prefaulting" operations to slow paths where they ensure forward progress but they do not

Re: [PATCH v5 1/3] selftests/lam: Move cpu_has_la57() to use cpuinfo flag

2025-01-24 Thread Dave Hansen
On 1/24/25 12:17, Maciej Wieczor-Retman wrote: >> Could you poke around and see if there is any existing ABI that we can >> use to query LA57 support? Maybe one of the things KVM exports, or some >> TASK_SIZE_MAX comparisons? > Sure, I'll try to find some other way. > > My previous tactic was to m

Re: [PATCH v5 3/3] selftests/lam: Test get_user() LAM pointer handling

2025-01-24 Thread Dave Hansen
On 11/27/24 09:35, Maciej Wieczor-Retman wrote: ... > + switch (test->later) { > + case GET_USER_USER: > + /* Control group - properly tagger user pointer */ > + ptr = (void *)set_metadata((uint64_t)ptr, test->lam); > + break; s/tagger/tagged/ ? > +

Re: [PATCH v5 2/3] selftests/lam: Skip test if LAM is disabled

2025-01-24 Thread Dave Hansen
On 11/27/24 09:35, Maciej Wieczor-Retman wrote: > +static inline int kernel_has_lam(void) > +{ > + unsigned long bits; > + > + syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits); > + return !!bits; > +} Generally, I'm less picky about selftest/ code than in-kernel code. But people r

Re: [PATCH v5 1/3] selftests/lam: Move cpu_has_la57() to use cpuinfo flag

2025-01-24 Thread Dave Hansen
On 11/27/24 09:35, Maciej Wieczor-Retman wrote: > -/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] > */ > static inline int cpu_has_la57(void) > { > - unsigned int cpuinfo[4]; > - > - __cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]); > -

Re: [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant

2025-01-14 Thread Dave Hansen
On 1/14/25 09:51, Valentin Schneider wrote: > + cr4 = this_cpu_read(cpu_tlbstate.cr4); > + asm volatile("mov %0,%%cr4": : "r" (cr4 ^ X86_CR4_PGE) : "memory"); > + asm volatile("mov %0,%%cr4": : "r" (cr4) : "memory"); > + /* > + * In lieu of not having the pinning crap, hard fai

Re: [PATCH v4 26/30] x86,tlb: Make __flush_tlb_global() noinstr-compliant

2025-01-14 Thread Dave Hansen
On 1/14/25 09:51, Valentin Schneider wrote: > + cr4 = this_cpu_read(cpu_tlbstate.cr4); > + asm volatile("mov %0,%%cr4": : "r" (cr4 ^ X86_CR4_PGE) : "memory"); > + asm volatile("mov %0,%%cr4": : "r" (cr4) : "memory"); > + /* > + * In lieu of not having the pinning crap, hard fai

Re: [RFC PATCH v2 00/15] pkeys-based page table hardening

2025-01-09 Thread Dave Hansen
One of the sticker things in the x86 attempt to do the same thing was context switching, both between normal tasks and in/out of exceptions and interrupts. The easiest place this manifested for us was code chunk like this: kpkeys_set_level(KPKEYS_LVL_PGTABLES); // modify page tabl

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> One super tiny nit is that the PAE pgd _can_ be allocated using >> __get_free_pages(). It was originally there for Xen, but I think it's >> being used for PTI only at this point and the comments are wrong-ish. >> >> I kinda think we should just get rid of

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> One super tiny nit is that the PAE pgd _can_ be allocated using >> __get_free_pages(). It was originally there for Xen, but I think it's >> being used for PTI only at this point and the comments are wrong-ish. >> >> I kinda think we should just get rid of

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> One super tiny nit is that the PAE pgd _can_ be allocated using >> __get_free_pages(). It was originally there for Xen, but I think it's >> being used for PTI only at this point and the comments are wrong-ish. >> >> I kinda think we should just get rid of

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> Acked-by: Dave Hansen > Just to double-check, are your ack'ing the x86 changes specifically? If > so I'll add your Acked-by on patch 6, 7 and 9. Feel free to add it to each patch in the series.

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> Acked-by: Dave Hansen > Just to double-check, are your ack'ing the x86 changes specifically? If > so I'll add your Acked-by on patch 6, 7 and 9. Feel free to add it to each patch in the series.

Re: [PATCH 00/10] Account page tables at all levels

2024-12-20 Thread Dave Hansen
On 12/20/24 02:58, Kevin Brodsky wrote: >> Acked-by: Dave Hansen > Just to double-check, are your ack'ing the x86 changes specifically? If > so I'll add your Acked-by on patch 6, 7 and 9. Feel free to add it to each patch in the series.

Re: [PATCH 00/10] Account page tables at all levels

2024-12-19 Thread Dave Hansen
d just fine in the generic one: Acked-by: Dave Hansen One super tiny nit is that the PAE pgd _can_ be allocated using __get_free_pages(). It was originally there for Xen, but I think it's being used for PTI only at this point and the comments are wrong-ish. I kinda think we should just get rid o

Re: [PATCH 00/10] Account page tables at all levels

2024-12-19 Thread Dave Hansen
d just fine in the generic one: Acked-by: Dave Hansen One super tiny nit is that the PAE pgd _can_ be allocated using __get_free_pages(). It was originally there for Xen, but I think it's being used for PTI only at this point and the comments are wrong-ish. I kinda think we should just get rid o

Re: [PATCH 00/10] Account page tables at all levels

2024-12-19 Thread Dave Hansen
d just fine in the generic one: Acked-by: Dave Hansen One super tiny nit is that the PAE pgd _can_ be allocated using __get_free_pages(). It was originally there for Xen, but I think it's being used for PTI only at this point and the comments are wrong-ish. I kinda think we should just get rid o

Re: [EXTERNAL] [PATCH 1/9] x86/kexec: Disable global pages before writing to control page

2024-12-17 Thread Dave Hansen
On 12/17/24 06:56, David Woodhouse wrote: >> Anyway, I think we can leave the belt-and-suspenders programming in this >> case. A comment wouldn't hurt I guess. > I'm a little lost. In this case I don't see belt-and-suspenders > programming. We're not loading CR3 after clearing CR4.PGE just to be >

Re: [PATCH 1/9] x86/kexec: Disable global pages before writing to control page

2024-12-17 Thread Dave Hansen
On 12/17/24 04:25, Kirill A. Shutemov wrote: >> Clear the PGE bit in %cr4 early, before storing data in the control page. > It worth noting that flipping CR4.PGE triggers TLB flush. I was not sure > if CR3 write is required to make it happen. I thought about removing the CR3 write. But I decided a

Re: [PATCH] x86/kexec: Only write through identity mapping of control page

2024-12-12 Thread Dave Hansen
On 12/12/24 13:32, David Woodhouse wrote: > On 12 December 2024 21:18:10 GMT, Dave Hansen wrote: >> On 12/12/24 12:11, David Woodhouse wrote: >>> From: David Woodhouse >>> >>> The virtual mapping of the control page may have been _PAGE_GLOBAL and >>>

Re: [PATCH] x86/kexec: Only write through identity mapping of control page

2024-12-12 Thread Dave Hansen
On 12/12/24 12:11, David Woodhouse wrote: > From: David Woodhouse > > The virtual mapping of the control page may have been _PAGE_GLOBAL and > thus its PTE might not have been flushed on the %cr3 switch and it might > effectively still be read-only. Move the writes to it down into the > identity_

Re: [PATCH] Grab mm lock before grabbing pt lock

2024-12-05 Thread Dave Hansen
On 12/4/24 02:35, Maksym Planeta wrote: > Function xen_pin_page calls xen_pte_lock, which in turn grab page > table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock > to be held before grabbing ptlock, but this does not happen when pinning > is caused by xen_mm_pin_all. In chan

Re: [RFC PATCH] x86/mm: Disable PTI for kernel_ident_mapping_init()

2024-11-26 Thread Dave Hansen
On 11/26/24 03:42, David Woodhouse wrote: > I threw this version together and it didn't immediately explode... It's better than playing #define games. The damage is also pretty limited and it helps us avoid plumbing a bit through the page table handling function arguments.

Re: [RFC PATCH] x86/mm: Disable PTI for kernel_ident_mapping_init()

2024-11-25 Thread Dave Hansen
On 11/25/24 10:53, David Woodhouse wrote: >> I think we have a lot of software-available space in the page table >> pointer entries. What would folks think if we set a special bit in those >> p4d entries that said: >> >> "I don't need to be propagated to >> the user portion of the page ta

Re: [RFC PATCH] x86/mm: Disable PTI for kernel_ident_mapping_init()

2024-11-25 Thread Dave Hansen
On 11/25/24 09:05, David Woodhouse wrote: > Not sure I like this very much, but it works, and mirrors what > arch/x86/boot/compressed/ident_map_64.c already does. I don't like it much, either. arch/x86/boot/compressed/ is already on the road to sharing no code with the core kernel and it's full o

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

2024-11-21 Thread Dave Hansen
On 11/21/24 03:12, Peter Zijlstra wrote: >> I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag, >> and it correctly uses the non-deferrable flush_tlb_kernel_range(). > > I always forget what we use global pages for, dhansen might know, but > let me try and have a look. > >

Re: [PATCH v2] Documentation/CoC: spell out enforcement for unacceptable behaviors

2024-11-12 Thread Dave Hansen
On 11/12/24 11:21, Daniel Vetter wrote: > Also, if a maintainer refuses to implement an enforcement decision, > will they be sanctioned too? Since this is all an entirely new section > and does not touch any of the existing sections I'm also not clear on > when one or the other rules apply, and how

Re: [PATCH v4 1/3] mm/pkey: Add PKEY_UNRESTRICTED macro

2024-11-08 Thread Dave Hansen
On 11/8/24 00:53, Yury Khrustalev wrote: > This patch adds PKEY_UNRESTRICTED macro defined as 0x0. Thanks for doing this and the follow-on selftests mods! Acked-by: Dave Hansen

Re: [PATCH v4 2/3] selftests/mm: Use PKEY_UNRESTRICTED macro

2024-11-08 Thread Dave Hansen
On 11/8/24 00:53, Yury Khrustalev wrote: > Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions > are used in mm selftests for memory protection keys. Acked-by: Dave Hansen

Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

2024-11-01 Thread Dave Hansen
On 11/1/24 12:29, Manwaring, Derek wrote: > As far as performance, are you talking about just the fracturing or > something beyond that? The data Mike brought to LSFMMBPF 2023 showed the > perf impact from direct map fragmentation for memfd_secret isn't "that > bad" [1]. Just the fracturing. Mike

Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

2024-11-01 Thread Dave Hansen
On 11/1/24 11:31, Manwaring, Derek wrote: >>From that standpoint I'm still tempted to turn the question around a bit > for the host kernel's perspective. Like if the host kernel should not > (and indeed cannot with TDX controls in place) access guest private > memory, why not remove it from the dir

Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

2024-11-01 Thread Dave Hansen
On 11/1/24 09:56, Manwaring, Derek wrote: ... >>> Any software except guest TD or TDX module must not be able to >>> speculatively or non-speculatively access TD private memory, >> >> That's a pretty broad claim and it involves mitigations in hardware and >> the TDX module. >> >> 1. https://cdrdv2.

Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

2024-11-01 Thread Dave Hansen
On 10/31/24 17:10, Manwaring, Derek wrote: > TDX and SEV encryption happens between the core and main memory, so > cached guest data we're most concerned about for transient execution > attacks isn't necessarily inaccessible. > > I'd be interested what Intel, AMD, and other folks think on this, bu

Re: [PATCH v3 4/5] selftests/mm: Use generic pkey register manipulation

2024-10-29 Thread Dave Hansen
The test changes look good to me: Acked-by: Dave Hansen

Re: [PATCH] x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()

2024-10-29 Thread Dave Hansen
On 10/29/24 08:13, Kirill A. Shutemov wrote: > On Wed, Oct 16, 2024 at 01:50:48PM +0300, Kirill A. Shutemov wrote: >> Rename the helper to better reflect its function. >> >> Signed-off-by: Kirill A. Shutemov >> Suggested-by: Dave Hansen > > KVM patch is Linus&#x

Re: [PATCH] x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()

2024-10-29 Thread Dave Hansen
On 10/29/24 08:13, Kirill A. Shutemov wrote: > On Wed, Oct 16, 2024 at 01:50:48PM +0300, Kirill A. Shutemov wrote: >> Rename the helper to better reflect its function. >> >> Signed-off-by: Kirill A. Shutemov >> Suggested-by: Dave Hansen > > KVM patch is Linus&#x

Re: [PATCH v2 4/5] selftests/mm: Use generic pkey register manipulation

2024-10-25 Thread Dave Hansen
On 10/25/24 01:31, Kevin Brodsky wrote: > I agree, the naming is not ideal, I lacked inspiration! Maybe > PKEY_REG_ALLOW_NONE to remain generic? Works for me. >>> static inline void __page_o_noops(void) >>> { >>> /* 8-bytes of instruction * 512 bytes = 1 page */ >>> diff --git a/tools/testi

Re: [PATCH v2 4/5] selftests/mm: Use generic pkey register manipulation

2024-10-23 Thread Dave Hansen
On 10/23/24 08:05, Kevin Brodsky wrote: ...> diff --git a/tools/testing/selftests/mm/pkey-x86.h b/tools/testing/selftests/mm/pkey-x86.h > index 5f28e26a2511..53ed9a336ffe 100644 > --- a/tools/testing/selftests/mm/pkey-x86.h > +++ b/tools/testing/selftests/mm/pkey-x86.h > @@ -34,6 +34,8 @@ > #defin

Re: [PATCH v2 2/5] mm: add PTE_MARKER_GUARD PTE marker

2024-10-21 Thread Dave Hansen
;> >> Thanks for the suggestion though! > To put it on list - Dave Hansen commented on IRC that it would be safer to > avoid this for now due to this being an ABI change, and reasonable to > perhaps add it later if required, so that seems a sensible way forward. We added SEGV_PKUER

Re: [PATCH v3 0/5] x86/pvh: Make 64bit PVH entry relocatable

2024-09-25 Thread Dave Hansen
On 9/25/24 02:28, Juergen Gross wrote: > On 16.09.24 10:44, Juergen Gross wrote: >> x86 maintainers, >> >> are you going to pick this series up, or should I take it via the >> Xen tree? > > I take the silence as a "its okay to go via the Xen tree". Or, "most of us were traveling last week and in

Re: [PATCH v3 4/5] x86/kernel: Move page table macros to header

2024-09-25 Thread Dave Hansen
On 8/23/24 12:36, Jason Andryuk wrote: > The PVH entry point will need an additional set of prebuild page tables. > Move the macros and defines to pgtable_64.h, so they can be re-used. > > Signed-off-by: Jason Andryuk > Reviewed-by: Juergen Gross Acked-by: Dave Hansen

Re: [PATCH v5 06/30] arm64: context switch POR_EL0 register

2024-09-11 Thread Dave Hansen
On 9/11/24 08:01, Kevin Brodsky wrote: > On 22/08/2024 17:10, Joey Gouly wrote: >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct >> kernel_clone_args *args) >> if (system_supports_tpidr2()) >> p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPID

Re: [PATCH v5 06/30] arm64: context switch POR_EL0 register

2024-09-11 Thread Dave Hansen
On 9/11/24 08:01, Kevin Brodsky wrote: > On 22/08/2024 17:10, Joey Gouly wrote: >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct >> kernel_clone_args *args) >> if (system_supports_tpidr2()) >> p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPID

Re: [PATCH v2 2/2] x86/sgx: Log information when a node lacks an EPC section

2024-09-05 Thread Dave Hansen
On 9/5/24 07:24, Jarkko Sakkinen wrote: >> +for_each_online_node(nid) { >> +if (!node_isset(nid, sgx_numa_mask) && >> +node_state(nid, N_MEMORY) && node_state(nid, N_CPU)) >> +pr_info("node%d has both CPUs and memory but doesn't >> have an EPC se

Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

2024-08-30 Thread Dave Hansen
On 8/29/24 01:42, Lorenzo Stoakes wrote: >> These applications work on x86 because x86 does an implicit 47-bit >> restriction of mmap() address that contain a hint address that is less >> than 48 bits. > You mean x86 _has_ to limit to physically available bits in a canonical > format 🙂 this will no

Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

2024-08-30 Thread Dave Hansen
On 8/29/24 01:42, Lorenzo Stoakes wrote: >> These applications work on x86 because x86 does an implicit 47-bit >> restriction of mmap() address that contain a hint address that is less >> than 48 bits. > You mean x86 _has_ to limit to physically available bits in a canonical > format 🙂 this will no

Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

2024-08-30 Thread Dave Hansen
On 8/29/24 01:42, Lorenzo Stoakes wrote: >> These applications work on x86 because x86 does an implicit 47-bit >> restriction of mmap() address that contain a hint address that is less >> than 48 bits. > You mean x86 _has_ to limit to physically available bits in a canonical > format 🙂 this will no

  1   2   3   4   5   6   7   8   9   10   >