> tools/testing/selftests/x86/lam.c | 122 ++++++++--
> 1 file changed, 117 insertions(+), 5 deletions(-)
Apart from the nitpick in 1/3, looks good to me:
Acked-by: Kirill A. Shutemov
--
Kiryl Shutsemau / Kirill A. Shutemov
tem("cat /proc/cpuinfo | grep -wq la57\n");
Heh. grep can read files on its own :P
return !system("grep -wq la57 /proc/cpuinfo");
>
> - return (cpuinfo[2] & (1 << 16));
> + return !ret;
> }
>
> /*
> --
> 2.47.1
>
--
Kiryl Shutsemau / Kirill A. Shutemov
> + },
> + {
> + .later = GET_USER_KERNEL_BOT,
> + .expected = 1,
> + .lam = LAM_U57_BITS,
> + .test_func = get_user_syscall,
> + .msg = "GET_USER:[Negative] get_user() with a kernel pointer
> and the bottom sign-extension bit cleared.\n",
> + },
> + {
> + .later = GET_USER_KERNEL,
> + .expected = 1,
> + .lam = LAM_U57_BITS,
> + .test_func = get_user_syscall,
> + .msg = "GET_USER:[Negative] get_user() and pass a kernel
> pointer.\n",
> + },
> };
>
> static struct testcases mmap_cases[] = {
> --
> 2.46.2
>
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Aug 08, 2024 at 06:28:02PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 08, 2024 at 11:03:30AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 08, 2024 at 04:15:25PM +0300, Kirill A. Shutemov wrote:
> > > On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wro
int ptr_value = 0;
> + void *ptr = &ptr_value;
> + int fd;
> +
> + uint64_t bitmask = ((uint64_t)ptr & L5_ADDR) ? L5_SIGN_EXT_MASK :
> +L4_SIGN_EXT_MASK;
Emm. Do you expect stack to be above at the very top of address space on
5-level paging machines? It is not true. We don't allocate any memory
above 46-bit unless asked explicitly.
See tools/testing/selftests/mm/va_high_addr_switch.c
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Aug 08, 2024 at 11:03:30AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 08, 2024 at 04:15:25PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wro
On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wrote:
> > Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory
> > accesses during the hang.
> >
> > Invalid read
e
> >is not in use.
> >
> >Looks like virtio-console continues to write to the MMIO even after
> >underlying virtio-pci device is removed.
> >
> >The problem can be mitigated by removing all virtio devices on virtio
> >bus shutdown.
> >
> >Sig
...
It was traced down to virtio-console. Kexec works fine if virtio-console
is not in use.
Looks like virtio-console continues to write to the MMIO even after
underlying virtio-pci device is removed.
The problem can be mitigated by removing all virtio devices on virtio
bus shutdown.
Signed-off-by
ned long __get_wchan(struct task_struct *p)
> return addr;
> }
>
> +static int get_coco_user_hcall_mode(void)
> +{
> + return !test_bit(MM_CONTEXT_COCO_USER_HCALL,
> + ¤t->mm->context.flags);
Hm. Why "!"?
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Jul 22, 2024 at 10:04:40PM -0700, Tim Merrifield wrote:
>
> Thanks for the review, Kirill.
>
> On Mon, Jul 08, 2024 at 03:19:54PM +0300, Kirill A . Shutemov wrote:
> > Hm. Per-thread flag is odd. I think it should be per-process.
>
> This is the only
ifdef CONFIG_INTEL_TDX_GUEST
> + .runtime.tdx_hcall = vmware_tdx_user_hcall,
> +#endif
> };
> --
> 2.40.1
>
--
Kiryl Shutsemau / Kirill A. Shutemov
long do_arch_prctl_common(int option, unsigned long arg2)
> {
> switch (option) {
> @@ -1052,6 +1067,11 @@ long do_arch_prctl_common(int option, unsigned long
> arg2)
> case ARCH_GET_XCOMP_GUEST_PERM:
> case ARCH_REQ_XCOMP_GUEST_PERM:
> return fpu_xstate_prctl(option, arg2);
> + case ARCH_GET_COCO_USER_HCALL:
> + return get_coco_user_hcall_mode();
> + case ARCH_SET_COCO_USER_HCALL:
> + return set_coco_user_hcall_mode(arg2);
> +
> }
>
> return -EINVAL;
> --
> 2.40.1
>
--
Kiryl Shutsemau / Kirill A. Shutemov
asier to follow as well.
>
> Cc:
> Cc: Rafael J. Wysocki
> Cc: Liu Shixin
> Cc: Dan Williams
> Cc: Kirill A. Shutemov
> Reported-by: Chris Piper
> Signed-off-by: Vishal Verma
Acked-by: Kirill A. Shutemov
--
Kiryl Shutsemau / Kirill A. Shutemov
> +}
> +
Hm. I think it indicates that these set_bit()s do not belong to
initiator_cmp().
Maybe remove both set_bit() from the compare helper and walk the list
separately to initialize the node mask? I think it will be easier to
follow.
--
Kiryl Shutsemau / Kirill A. Shutemov
Remove it.
>
> Cc: Rafael J. Wysocki
> Cc: Liu Shixin
> Cc: Dan Williams
> Signed-off-by: Vishal Verma
Acked-by: Kirill A. Shutemov
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Apr 19, 2021 at 08:09:13PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Mon, Apr 19, 2021 at 06:09:29PM +, Sean Christopherson wrote:
> > > On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > > > On Mon, Ap
On Mon, Apr 19, 2021 at 06:09:29PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Mon, Apr 19, 2021 at 04:01:46PM +, Sean Christopherson wrote:
> > > But fundamentally the private pages, are well, private. They can't be
&g
On Mon, Apr 19, 2021 at 04:01:46PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Fri, Apr 16, 2021 at 05:30:30PM +, Sean Christopherson wrote:
> > > I like the idea of using "special" PTE value to denote guest private
On Fri, Apr 16, 2021 at 05:30:30PM +, Sean Christopherson wrote:
> On Fri, Apr 16, 2021, Kirill A. Shutemov wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1b404e4d7dd8..f8183386abe7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x
On Fri, Apr 16, 2021 at 06:10:30PM +0200, Borislav Petkov wrote:
> On Fri, Apr 16, 2021 at 06:40:55PM +0300, Kirill A. Shutemov wrote:
> > Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
> >
> > Host side doesn't provide the feature yet, so
; +static inline struct folio *folio_next(struct folio *folio)
> +{
> + return (struct folio *)folio_page(folio, folio_nr_pages(folio));
> +}
>
> (it occurs to me this should also be const-preserving, but it's not clear
> that's needed yet)
Are we risking that we would need to replace inline functions with macros
all the way down? Not sure const-preserving worth it.
--
Kirill A. Shutemov
t is missing.
[1]
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
Not-signed-off-by: Kirill A. Shutemov
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/mmu/mmu.c | 12 ++-
arch/x86/kvm/mmu/paging_
Make struct kvm pointer available within hva_to_pfn_slow(). It is
prepartion for the next patch.
Signed-off-by: Kirill A. Shutemov
---
arch/powerpc/kvm/book3s_64_mmu_hv.c| 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
arch/x86/kvm/mmu/mmu.c | 8 +++--
include/linux
hvclock is shared between the guest and the hypervisor. It has to be
accessible by host.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/kernel/kvmclock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index
with hwpoison entries.
Keep page referenced on setting up hwpoison entries, copy the reference
on fork() and return on zap().
Signed-off-by: Kirill A. Shutemov
---
mm/memory.c | 6 ++
mm/rmap.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
If the page got unpoisoned we can replace hwpoison entry with a present
PTE on page fault instead of delivering SIGBUS.
Signed-off-by: Kirill A. Shutemov
---
mm/memory.c | 38 +-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm
Forbid access to poisoned pages.
TODO: Probably more fine-grained approach is needed. It shuld be a
allowed to fault-in these pages as hwpoison entries.
Not-Signed-off-by: Kirill A. Shutemov
---
mm/shmem.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index
If KVM memory protection is active, the trampoline area will need to be
in shared memory.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/realmode/init.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index
Add helpers to convert hwpoison swap entry to pfn and page.
Signed-off-by: Kirill A. Shutemov
---
include/linux/swapops.h | 20
1 file changed, 20 insertions(+)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d9b7c9132c2f..520589b12fb3 100644
--- a
The new flag allows to bypass check if the page is poisoned and get
reference on it.
Signed-off-by: Kirill A. Shutemov
---
include/linux/mm.h | 1 +
mm/gup.c | 29 ++---
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/linux/mm.h b
Mirror SEV, use SWIOTLB always if KVM memory protection is enabled.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mem_encrypt.h | 7 +++--
arch/x86/kernel/kvm.c | 2 ++
arch/x86/kernel/pci-swiotlb.c | 3 +-
arch/x86/mm
Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
Host side doesn't provide the feature yet, so it is a dead code for now.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_para.h | 5 +
arch/x86/include/uap
page
allows to touch it
- FOLL_ALLOW_POISONED is implemented
The patchset can also be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-poison
Kirill A. Shutemov (13):
x86/mm: Move force_dma_unencrypted() to common code
x86/kvm: Introduce KVM memory
: Kirill A. Shutemov
---
arch/x86/Kconfig | 7 +-
arch/x86/include/asm/io.h| 4 +++-
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mem_encrypt.c| 30 -
arch/x86/mm/mem_encrypt_common.c | 38
5
Make force_dma_unencrypted() return true for KVM to get DMA pages mapped
as shared.
__set_memory_enc_dec() now informs the host via hypercall if the state
of the page has changed from shared to private or back.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch
have not looked at the rest of the patches yet, but why do you need a
special free path for shadow stack? Why the normal unmap route doesn't
work for you?
> + if (r == -EINTR) {
> + cond_resched();
> + continue;
> + }
> + break;
> + }
> +
> + cet->shstk_base = 0;
> + cet->shstk_size = 0;
> +}
> +
> +void shstk_disable(void)
> +{
> + struct cet_status *cet = ¤t->thread.cet;
> + u64 msr_val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> + !cet->shstk_size ||
> + !cet->shstk_base)
> + return;
> +
> + start_update_msrs();
> + rdmsrl(MSR_IA32_U_CET, msr_val);
> + wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN);
> + wrmsrl(MSR_IA32_PL3_SSP, 0);
> + end_update_msrs();
> +
> + shstk_free(current);
> +}
> --
> 2.21.0
>
>
--
Kirill A. Shutemov
k
> (RW=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need
> to preserve it.
>
> Exclude shadow stack from preserve_write test, and apply the same change to
> change_huge_pmd().
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kirill A. Shutemov
> ---
> v24:
&g
by: Yu-cheng Yu
> Reviewed-by: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping().
>
> mm/gup.c | 8 +---
> mm/huge_memory.c | 8 +---
> 2 files changed, 10 insertions(+), 6 deletions(-)
&
On Thu, Apr 01, 2021 at 03:10:52PM -0700, Yu-cheng Yu wrote:
> Account shadow stack pages to stack memory.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping()
8 = 2040 bytes and
> 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE.
> Thus, putting a gap page on both ends of a shadow stack prevents INCSSP,
> CALL, and RET from going beyond.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutem
t; - In change_pte_range(), pte_mkwrite() is called directly. Replace it with
> maybe_mkwrite().
>
> A shadow stack vma is writable but has different vma
> flags, and handled accordingly in maybe_mkwrite().
>
Have you checked THP side? Looks like at least do_huge_pmd_numa_page()
needs adjustment, no?
--
Kirill A. Shutemov
e().
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Instead of doing arch_maybe_mkwrite(), overwrite maybe*_mkwrite() with x86
> versions.
> - Change VM_SHSTK to VM_SHADOW_STACK.
>
> arch/x86/inclu
ed and both are
> handled as a write access.
>
> Signed-off-by: Yu-cheng Yu
> Reviewed-by: Kees Cook
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
DOW_STACK to track shadow stack VMAs.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
r Zijlstra provided many
> insights to the issue. Jann Horn provided the cmpxchg solution.
>
> Signed-off-by: Yu-cheng Yu
> Reviewed-by: Kees Cook
> Cc: Kirill A. Shutemov
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
pages() works - trigger a fault first, then
> lookup the PTE in the page tables).
For now, the patch has two step poisoning: first fault in, on the add to
shadow PTE -- poison. By the time VM has chance to use the page it's
poisoned and unmapped from the host userspace.
--
Kirill A. Shutemov
On Thu, Apr 08, 2021 at 11:52:35AM +0200, Borislav Petkov wrote:
> On Fri, Apr 02, 2021 at 06:26:40PM +0300, Kirill A. Shutemov wrote:
> > Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
> >
> > Host side doesn't provide the feature yet, so
On Wed, Apr 07, 2021 at 04:55:54PM +0200, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > TDX architecture aims to provide resiliency against confidentiality and
> > integrity attacks. Towards this goal, the TDX architecture helps enforce
> > t
On Wed, Apr 07, 2021 at 04:09:35PM +0200, David Hildenbrand wrote:
> On 07.04.21 15:16, Kirill A. Shutemov wrote:
> > On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote:
> > > On 06.04.21 16:33, Dave Hansen wrote:
> > > > On 4/6/21 12:44 AM, David H
On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote:
> On 06.04.21 16:33, Dave Hansen wrote:
> > On 4/6/21 12:44 AM, David Hildenbrand wrote:
> > > On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > > > TDX architecture aims to provide resilien
On Tue, Apr 06, 2021 at 09:11:25AM -0700, Dave Hansen wrote:
> On 4/6/21 8:37 AM, Kirill A. Shutemov wrote:
> > On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote:
> >> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> >>> From: "Kirill A. Shute
pa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>
> That "!enc" looks wrong to me. Caches would need to be flushed whenever
> encryption attributes *change*, not just when they are set.
>
> Also, cpa_flush() flushes caches *AND* the TLB. How does TDX manage to
> not need TLB flushes?
I will double-check everthing, but I think we can skip *both* cpa_flush()
for private->shared conversion. VMM and TDX module will take care about
TLB and cache flush in response to MapGPA TDVMCALL.
> > ret = __change_page_attr_set_clr(&cpa, 1);
> >
> > @@ -2012,6 +2020,11 @@ static int __set_memory_enc_dec(unsigned long addr,
> > int numpages, bool enc)
> > */
> > cpa_flush(&cpa, 0);
> >
> > + if (!ret && is_tdx_guest()) {
> > + ret = tdx_map_gpa(__pa(addr), numpages, enc);
> > + // XXX: need to undo on error?
> > + }
>
> Time to fix this stuff up if you want folks to take this series more
> seriously.
My bad, will fix it.
--
Kirill A. Shutemov
On Thu, Apr 01, 2021 at 01:26:23PM -0700, Dave Hansen wrote:
> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov"
> >
> > All ioremap()ed paged that are not backed by normal memory (NONE or
> > RESERVED) have to be
ng is more fragile
there.
I would rather keep it as is. We should be fine as long as we only allow
to clear bits from the mask.
--
Kirill A. Shutemov
On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote:
> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov"
> >
> > Intel TDX doesn't allow VMM to access guest memory. Any memory that is
> > required for c
Christoph, I'm not a fan of this :/
>
> What would you prefer?
I liked earlier approach with only struct page here. Once we know a field
should never be referenced from raw struct page, we can move it here.
But feel free to ignore my suggestion. It's not show-stopper for me and
revert
te;
> + atomic_t _mapcount;
> + atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> + unsigned long memcg_data;
> +#endif
As Christoph, I'm not a fan of this :/
> + /* private: the union with struct page is transitional */
> + };
> + struct page page;
> + };
> +};
--
Kirill A. Shutemov
On Tue, Apr 06, 2021 at 09:44:07AM +0200, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > TDX architecture aims to provide resiliency against confidentiality and
> > integrity attacks. Towards this goal, the TDX architecture helps enforce
> > t
hvclock is shared between the guest and the hypervisor. It has to be
accessible by host.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/kernel/kvmclock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index
Mirror SEV, use SWIOTLB always if KVM memory protection is enabled.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mem_encrypt.h | 7 +++--
arch/x86/kernel/kvm.c | 2 ++
arch/x86/kernel/pci-swiotlb.c | 3 +-
arch/x86/mm
Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
Host side doesn't provide the feature yet, so it is a dead code for now.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_para.h | 5 +
arch/x86/include/uap
: Kirill A. Shutemov
---
arch/x86/Kconfig | 7 +-
arch/x86/include/asm/io.h| 4 +++-
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mem_encrypt.c| 30 -
arch/x86/mm/mem_encrypt_common.c | 38
5
Make force_dma_unencrypted() return true for KVM to get DMA pages mapped
as shared.
__set_memory_enc_dec() now informs the host via hypercall if the state
of the page has changed from shared to private or back.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch
the issue.
The core of the change is in the last patch. Please see more detailed
description of the issue and proposoal of the solution there.
The patchset can also be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-poison
Kirill A. Shutemov (7):
x86/mm: Move
n.
- Poisoned pages must be tied to KVM instance and another KVM must not
be able to map the page into guest.
[1]
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
Not-signed-off-by: Kirill A. Shutemov
---
arch/x86/kvm/Kconfig
If KVM memory protection is active, the trampoline area will need to be
in shared memory.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/realmode/init.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index
On Fri, Mar 26, 2021 at 08:46:30AM -0700, Yu, Yu-cheng wrote:
> On 3/22/2021 3:57 AM, Kirill A. Shutemov wrote:
> > On Tue, Mar 16, 2021 at 08:10:44AM -0700, Yu-cheng Yu wrote:
> > > Account shadow stack pages to stack memory.
> > >
> > > Signed-off-by: Yu-ch
t; > > 00fc
> > > > [0.145129] R10: 990b80013c78 R11: 990b80013c7d R12:
> > > 972dc74ada80
> > > > [0.145129] R13: 972d474038c0 R14: R15:
> > > 00000000
> > > > [0.145129] FS: () GS:972d47a0()
> > > knlGS:
> > > > [0.145129] CS: 0010 DS: ES: CR0: 80050033
> > > > [0.145129] CR2: CR3: 0660a000 CR4:
> > > 003406f0
> > > > [0.145129] Call Trace:
> > > > [0.145129] acpi_os_release_object+0x5/0x10
> > > > [0.145129] acpi_ns_delete_children+0x46/0x59
> > > > [0.145129] acpi_ns_delete_namespace_subtree+0x5c/0x79
> > > > [0.145129] ? acpi_sleep_proc_init+0x1f/0x1f
> > > > [0.145129] acpi_ns_terminate+0xc/0x31
> > > > [0.145129] acpi_ut_subsystem_shutdown+0x45/0xa3
> > > > [0.145129] ? acpi_sleep_proc_init+0x1f/0x1f
> > > > [0.145129] acpi_terminate+0x5/0xf
> > > > [0.145129] acpi_init+0x27b/0x308
> > > > [0.145129] ? video_setup+0x79/0x79
> > > > [0.145129] do_one_initcall+0x7b/0x160
> > > > [0.145129] kernel_init_freeable+0x190/0x1f2
> > > > [0.145129] ? rest_init+0x9a/0x9a
> > > > [0.145129] kernel_init+0x5/0xf6
> > > > [0.145129] ret_from_fork+0x22/0x30
> > > > [0.145129] ---[ end trace 574554fca7bd06bb ]---
> > > > [0.145133] INFO: Allocated in acpi_ns_root_initialize+0xb6/0x2d1
> > > > age=58
> > > cpu=0 pid=0
> > > > [0.145881] kmem_cache_alloc_trace+0x1a9/0x1c0
> > > > [0.146132] acpi_ns_root_initialize+0xb6/0x2d1
> > > > [0.146578] acpi_initialize_subsystem+0x65/0xa8
> > > > [0.147024] acpi_early_init+0x5d/0xd1
> > > > [0.147132] start_kernel+0x45b/0x518
> > > > [0.147491] secondary_startup_64+0xb6/0xc0
> > > > [0.147897] [ cut here ]
> > > >
> > > > And it seems ACPI is allocating an object via kmalloc() and then
> > > > freeing it via kmem_cache_free(<"Acpi-Namespace" kmem_cache>) which
> > > is wrong.
> > > > > ./scripts/faddr2line vmlinux 'acpi_ns_root_initialize+0xb6'
> > > > acpi_ns_root_initialize+0xb6/0x2d1:
> > > > kmalloc at include/linux/slab.h:555
> > > > (inlined by) kzalloc at include/linux/slab.h:669 (inlined by)
> > > > acpi_os_allocate_zeroed at include/acpi/platform/aclinuxex.h:57
> > > > (inlined by) acpi_ns_root_initialize at
> > > > drivers/acpi/acpica/nsaccess.c:102
> > > >
> > Hi Vegard,
> >
> > > That's it :-) This fixes it for me:
> > We'll take this patch for ACPICA and it will be in the next release.
> >
> > Rafael, do you want to take this as a part of the next rc?
>
> Yes, I do.
Folks, what happened to the patch? I don't see it in current upstream.
Looks like it got reported again:
https://lore.kernel.org/r/a1461e21-c744-767d-6dfc-6641fd3e3...@siemens.com
--
Kirill A. Shutemov
ned-off-by: Yanfei Xu
Acked-by: Kirill A. Shutemov
--
Kirill A. Shutemov
On Mon, Mar 22, 2021 at 11:46:21AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 01:15:02PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Mar 16, 2021 at 08:10:38AM -0700, Yu-cheng Yu wrote:
>
> > > + pte_t old_pte, new_pte;
> > > +
> > &
On Tue, Mar 16, 2021 at 08:10:39AM -0700, Yu-cheng Yu wrote:
> +#ifdef CONFIG_X86_CET
> +# define VM_SHSTKVM_HIGH_ARCH_5
> +#else
> +# define VM_SHSTKVM_NONE
> +#endif
> +
Why not VM_SHADOW_STACK? Random reader may think SH stands for SHARED or
something.
--
Kirill A. Shutemov
GE_DIRTY or _PAGE_COW.
>
> Apply the same changes to pmd_modify().
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
id Airlie
> Cc: Joonas Lahtinen
> Cc: Jani Nikula
> Cc: Daniel Vetter
> Cc: Rodrigo Vivi
> Cc: Zhenyu Wang
> Cc: Zhi Wang
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
fine _PAGE_COW and update pte_*() helpers and apply the same changes to
> pmd and pud.
>
> After this, there are six free bits left in the 64-bit PTE, and no more
> free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not
> implemented for the 32-bit kernel.
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
On Tue, Mar 16, 2021 at 08:10:34AM -0700, Yu-cheng Yu wrote:
> To prepare the introduction of _PAGE_COW, move pmd_write() and
> pud_write() up in the file, so that they can be used by other
> helpers below.
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
return -EINVAL;
> +
> if (!file_mmap_ok(file, inode, pgoff, len))
> return -EOVERFLOW;
>
> @@ -1545,7 +1551,7 @@ unsigned long do_mmap(struct file *file, unsigned long
> addr,
> } else {
> switch (flags & MAP_TYPE) {
> case MAP_SHARED:
> - if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> + if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK))
> return -EINVAL;
> /*
>* Ignore pgoff.
> --
> 2.21.0
>
--
Kirill A. Shutemov
s)
> mm->stack_vm += npages;
> else if (is_data_mapping(flags))
> mm->data_vm += npages;
> + else if (arch_shadow_stack_mapping(flags))
> + mm->stack_vm += npages;
Ditto.
> }
>
> static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
> --
> 2.21.0
>
--
Kirill A. Shutemov
_end_gap(struct vm_area_struct *vma)
> {
> unsigned long vm_end = vma->vm_end;
> + unsigned long gap = 0;
> +
> + if (vma->vm_flags & VM_GROWSUP)
> + gap = stack_guard_gap;
> + else if (vma->vm_flags & VM_SHSTK)
> + gap = ARCH_SHADOW_STACK_GUARD_GAP;
>
> - if (vma->vm_flags & VM_GROWSUP) {
> - vm_end += stack_guard_gap;
> + if (gap != 0) {
> + vm_end += gap;
> if (vm_end < vma->vm_end)
> vm_end = -PAGE_SIZE;
> }
> --
> 2.21.0
>
--
Kirill A. Shutemov
override maybe_mkwrite()
and maybe_pmd_mkwrite() altogether. Wrap it into #ifndef maybe_mkwrite
here and provide VM_SHSTK-aware version from .
--
Kirill A. Shutemov
* This method cannot distinguish shadow stack read vs. write.
> + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect
> + * copy-on-write.
> + */
> + if (error_code & X86_PF_SHSTK)
> + flags |= FAULT_FLAG_WRITE;
> if (error_code & X86_PF_WRITE)
> flags |= FAULT_FLAG_WRITE;
> if (error_code & X86_PF_INSTR)
> --
> 2.21.0
>
--
Kirill A. Shutemov
_pmd = READ_ONCE(*pmdp);
> + do {
> + new_pmd = pmd_wrprotect(old_pmd);
> + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd,
> pmd_val(new_pmd)));
> +
> + return;
> + }
> clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
> }
>
> --
> 2.21.0
>
--
Kirill A. Shutemov
Anvin"
> Cc: Kees Cook
> Cc: Thomas Gleixner
> Cc: Dave Hansen
> Cc: Christoph Hellwig
> Cc: Andy Lutomirski
> Cc: Ingo Molnar
> Cc: Borislav Petkov
> Cc: Peter Zijlstra
Looks good to me.
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
> > [16247.544148] RSP: :a18cb0af7a40 EFLAGS: 00010246
> > [16247.544153] RAX: 0036 RBX: 000d RCX:
> > 8ef13fc9a748
> > [16247.544158] RDX: RSI: 0027 RDI:
> > 8ef13fc9a740
> > [16247.544162] RBP: 8eb2f9a02ef8 R08: 8ef23ffb48a8 R09:
> > 0004fffb
> > [16247.544166] R10: R11: 3fff R12:
> > 1400
> > [16247.544170] R13: 8eb2f9a02f00 R14: R15:
> > d651b1978000
> > [16247.544175] FS: 7f97c1717740() GS:8ef13fc8()
> > knlGS:
> > [16247.544180] CS: 0010 DS: ES: CR0: 80050033
> > [16247.544184] CR2: 7f97c0efec0d CR3: 0040aa3ac006 CR4:
> > 007706e0
> > [16247.544188] DR0: DR1: DR2:
> >
> > [16247.544191] DR3: DR6: fffe0ff0 DR7:
> > 0400
> > [16247.544194] PKRU: 5554
> > [16247.546763] BUG: Bad rss-counter state mm:060c94f4
> > type:MM_ANONPAGES val:8
> >
> >
--
Kirill A. Shutemov
On Mon, Mar 15, 2021 at 01:25:40PM +0100, David Hildenbrand wrote:
> On 15.03.21 13:22, Kirill A. Shutemov wrote:
> > On Mon, Mar 08, 2021 at 05:45:20PM +0100, David Hildenbrand wrote:
> > > + case -EHWPOISON: /* Skip over
abvious to me.
--
Kirill A. Shutemov
ogram is added to
> tools/testing/selftests/vm to utilize the interface by splitting
> PMD THPs and PTE-mapped THPs.
>
Okay, makes sense.
But it doesn't cover non-mapped THPs. tmpfs may have file backed by THP
that mapped nowhere. Do we want to cover this case too?
Maybe have PID:,, and
FILE:,, ?
--
Kirill A. Shutemov
erformance reports.
>
> To me, it's distraction, churn and friction, ongoing for years; but
> that's just me, and I'm resigned to the possibility that it will go in.
> Matthew is not alone in wanting to pursue it: let others speak.
I'm with Matthew on this. I would really want to drop the number of places
where we call compoud_head(). I hope we can get rid of the page flag
policy hack I made.
--
Kirill A. Shutemov
> 1 file changed, 20 insertions(+), 27 deletions(-)
Apart from patch 4/5, looks fine. For the rest, you can use:
Acked-by: Kirill A. Shutemov
--
Kirill A. Shutemov
epaged_do_scan() and with the change mem_cgroup_charge() may get
called twice for two different mm_structs.
Is it safe?
--
Kirill A. Shutemov
on to few
page flags will only complicate the picture.
--
Kirill A. Shutemov
On Sun, Feb 07, 2021 at 08:01:50AM -0800, Dave Hansen wrote:
> On 2/7/21 6:13 AM, Kirill A. Shutemov wrote:
> >>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM
> >>> */
> >>> + rcx = BIT(10) | BIT(11)
On Fri, Feb 05, 2021 at 05:06:20PM -0600, Seth Forshee wrote:
> This feature requires ino_t be 64-bits, which is true for every
> 64-bit architecture but s390, so prevent this option from being
> selected there.
Quick grep suggests the same for alpha. Am I wrong?
--
Kirill A. Shutemov
On Fri, Feb 05, 2021 at 03:42:01PM -0800, Andy Lutomirski wrote:
> On Fri, Feb 5, 2021 at 3:39 PM Kuppuswamy Sathyanarayanan
> wrote:
> >
> > From: "Kirill A. Shutemov"
> >
> > TDX has three classes of CPUID leaves: some CPUID leaves
> > are always
On Sun, Feb 07, 2021 at 09:24:23AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 5, 2021 at 4:16 PM Kirill A. Shutemov
> wrote:
> >
> > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > 64-bit linear addresses, allowing software to use of the untra
On Sun, Feb 07, 2021 at 09:07:02AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 5, 2021 at 4:43 PM H.J. Lu wrote:
> >
> > On Fri, Feb 5, 2021 at 7:16 AM Kirill A. Shutemov
> > wrote:
> > >
> > > Provide prctl() interface to enabled LAM for user addresses. Depe
strips address from the metadata bits and gets it to canonical
shape before handling memory access. It has to be done very early before
TLB lookup.
Signed-off-by: Kirill A. Shutemov
---
accel/tcg/cputlb.c| 54 +++
include/hw/core/cpu.h | 1
LAM_U48 steals bits above 47-bit for tags and makes it impossible for
userspace to use full address space on 5-level paging machine.
Make these features mutually exclusive: whichever gets enabled first
blocks the othe one.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/elf.h
On Fri, Feb 05, 2021 at 04:49:05PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 05, 2021 at 06:16:20PM +0300, Kirill A. Shutemov wrote:
> > The feature competes for bits with 5-level paging: LAM_U48 makes it
> > impossible to map anything about 47-bits. The patchset made these
The new thread flags indicate that the thread has Linear Address Masking
enabled.
switch_mm_irqs_off() now respects these flags and set CR3 accordingly.
The active LAM mode gets recorded in the tlb_state.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/thread_info.h | 9 ++-
arch
1 - 100 of 2340 matches
Mail list logo