from:"Kirill A. Shutemov"

Re: [PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()

2024-10-29 Thread Kirill A . Shutemov

that for safety on some AMD CPUs, this relies on recent commit > 86e6b1547b3d ("x86: fix user address masking non-canonical speculation > issue"). > > Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.s...@intel.com > Signed-off-by: Josh Poimboeuf Acked-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-28 Thread Kirill A. Shutemov

for safety). Okay, fair enough. -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-28 Thread Kirill A. Shutemov

On Wed, Oct 16, 2024 at 03:34:11PM -0700, Linus Torvalds wrote: > On Wed, 16 Oct 2024 at 15:13, Kirill A. Shutemov wrote: > > > > It is worse than that. If we get LAM_SUP enabled (there's KASAN patchset > > in works) this check will allow arbitrary kernel addresses. &g

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-22 Thread Kirill A. Shutemov

On Tue, Oct 22, 2024 at 01:16:58AM -0700, Pawan Gupta wrote: > On Mon, Oct 21, 2024 at 01:48:15PM +0300, Kirill A. Shutemov wrote: > > On Sun, Oct 20, 2024 at 03:59:25PM -0700, Linus Torvalds wrote: > > > On Sun, 20 Oct 2024 at 15:44, Josh Poimboeuf wrote: > > > > &

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-22 Thread Kirill A. Shutemov

On Mon, Oct 21, 2024 at 07:36:50PM -0700, Linus Torvalds wrote: > On Mon, 21 Oct 2024 at 03:48, Kirill A. Shutemov wrote: > > > > LAM brings own speculation issues[1] that is going to be addressed by > > LASS[2]. There was a patch[3] to disable LAM until LASS is landed,

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-21 Thread Kirill A. Shutemov

t never got applied for some reason. [1] https://download.vusec.net/papers/slam_sp24.pdf [2] https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shish...@linux.intel.com [3] https://lore.kernel.org/all/5373262886f2783f054256babdf5a98545dc986b.1706068222.git.pawan.kumar.gu...@linux.intel.com -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v2 3/6] x86/uaccess: Rearrange putuser.S

2024-10-18 Thread Kirill A . Shutemov

ut_user_2) > EXPORT_SYMBOL(__put_user_2) This patch provides an opportunity to give these labels more meaningful names, so that future rearrangements do not require as much boilerplate. For example, we can rename this label 2: to .Luser_2 or something similar. -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-16 Thread Kirill A. Shutemov

semantics, does it? > > Consider userspace passing an otherwise-good pointer with bit 60 set. > Previously that would have resulted in a failure, whereas now it will > succeed. It is worse than that. If we get LAM_SUP enabled (there's KASAN patchset in works) this check will allow arbitrary kernel addresses. -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-14 Thread Kirill A. Shutemov

e ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", X86_FEATURE_LFENCE_RDTSC +#define SHIFT_LEFT_TO_MSB ALTERNATIVE \ + "shl $(64 - 48), %rdx", \ + "shl $(64 - 57), %rdx", X86_FEATURE_LA57 + .macro check_range size:req .if IS_ENABLED(CONFIG_X86_64) mov %rax, %rdx + SHIFT_LEFT_TO_MSB sar $63, %rdx or %rdx, %rax .else -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

2024-10-14 Thread Kirill A. Shutemov

's out there that actually > have LAM enabled. Actually LAM is fine with the __VIRTUAL_MASK_SHIFT check. LAM enforces bit 47 (or 56 for 5-level paging) to be equal to bit 63. Otherwise it is canonicality violation. -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

2024-09-09 Thread Kirill A. Shutemov

On Thu, Sep 05, 2024 at 10:26:52AM -0700, Charlie Jenkins wrote: > On Thu, Sep 05, 2024 at 09:47:47AM +0300, Kirill A. Shutemov wrote: > > On Thu, Aug 29, 2024 at 12:15:57AM -0700, Charlie Jenkins wrote: > > > Some applications rely on placing data in free bits addresses alloc

Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

2024-09-05 Thread Kirill A. Shutemov

e got tested on x86 with 47bit VA. We can consider more options to opt-in into wider address space like personality or prctl() handle. But opt-out is no-go from what I see. -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Kirill A. Shutemov

et = 0; return vm_unmapped_area(&info); } -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

2023-07-26 Thread Kirill A . Shutemov

On Tue, Jul 25, 2023 at 01:51:55PM +0100, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > > diff --git a/mm/compaction.c b/mm/compaction.c > &

Re: [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

2023-07-25 Thread Kirill A . Shutemov

the mapping is still tied to the folio). Vlastimil, any comments? -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Kirill A. Shutemov

arch/m68k/Kconfig.cpu | 16 +--- > arch/nios2/Kconfig| 17 + > arch/powerpc/Kconfig | 22 +- > arch/sh/mm/Kconfig| 19 +-- > arch/sparc/Kconfig| 16 +++++--- > arch/xtensa/Kconfig | 16 +--- > 10 files changed, 76 insertions(+), 80 deletions(-) Acked-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-24 Thread Kirill A. Shutemov

On Thu, Sep 23, 2021 at 08:21:03PM +0200, Borislav Petkov wrote: > On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > > Unless we find other way to guarantee RIP-relative access, we must use > > fixup_pointer() to access any global variables. > > Yah, I&#x

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Kirill A. Shutemov

On Wed, Sep 22, 2021 at 09:52:07PM +0200, Borislav Petkov wrote: > On Wed, Sep 22, 2021 at 05:30:15PM +0300, Kirill A. Shutemov wrote: > > Not fine, but waiting to blowup with random build environment change. > > Why is it not fine? > > Are you suspecting that the co

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Kirill A. Shutemov

On Wed, Sep 22, 2021 at 08:40:43AM -0500, Tom Lendacky wrote: > On 9/21/21 4:58 PM, Kirill A. Shutemov wrote: > > On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote: > > > On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: > > > > On Tue, Sep 21, 2021 at 11:

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov

On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote: > On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: > > On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > > > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > >

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov

On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > I still believe calling cc_platform_has() from __startup_64() is totally > > broken as it lacks proper wrapping while accessing global varia

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov

mm/mem_encrypt_identity.c @@ -288,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp) unsigned long pgtable_area_len; unsigned long decrypted_base; - if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) + if (1 || !cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return; /* -- Kirill A. Shutemov

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-20 Thread Kirill A. Shutemov

ave a special version of the helper). Note that only AMD requires these cc_platform_has() to return true. -- Kirill A. Shutemov

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-12 Thread Kirill A. Shutemov

On Wed, Aug 11, 2021 at 10:52:55AM -0500, Tom Lendacky wrote: > On 8/11/21 7:19 AM, Kirill A. Shutemov wrote: > > On Tue, Aug 10, 2021 at 02:48:54PM -0500, Tom Lendacky wrote: > >> On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote: > >>> > >>> &g

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-11 Thread Kirill A. Shutemov

thing with this shared/unencrypted > area, though? Or since it is shared, there's actually nothing you need to > do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not > configured)? AFAICS, only kvmclock uses __bss_decrypted. We don't enable kvmclock in TDX at the moment. It may change in the future. -- Kirill A. Shutemov

Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout

2021-06-08 Thread Kirill A. Shutemov

On Tue, Jun 08, 2021 at 04:47:19PM +0530, Aneesh Kumar K.V wrote: > On 6/8/21 3:12 PM, Kirill A. Shutemov wrote: > > On Tue, Jun 08, 2021 at 01:22:23PM +0530, Aneesh Kumar K.V wrote: > > > > > > Hi Hugh, > > > > > > Hugh Dickins writes: > > &g

Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout

2021-06-08 Thread Kirill A. Shutemov

> > and old pfn > > unlock(pud_ptl) > ptep_clear_flush() > old pfn is free. > > Stale TLB entry > > Both the above race condition can be fixed if we force mremap path to > take rmap lock. > > Signed-off-by: Aneesh Kumar K.V Looks like it should be enough to address the race. It would be nice to understand what is performance overhead of the additional locking. Is it still faster to move single PMD page table under these locks comparing to moving PTE page table entries without the locks? -- Kirill A. Shutemov

Re: [PATCH] Raise the minimum GCC version to 5.2

2021-05-03 Thread Kirill A. Shutemov

ut you need to check it per distro. For Debian it would be here: https://distrowatch.com/table.php?distribution=debian -- Kirill A. Shutemov

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Kirill A. Shutemov

t sure it's an issue, but strictly speaking, size of page according to page table tree doesn't mean pagewalk would fill TLB entry of the size. CPU may support 1G pages in page table tree without 1G TLB at all. IIRC, current Intel CPU still don't have any 1G iTLB entries and fill 2M iTLB instead. -- Kirill A. Shutemov

Re: [PATCH v3 2/4] PM: hibernate: make direct map manipulations more explicit

2020-11-03 Thread Kirill A. Shutemov

On Tue, Nov 03, 2020 at 02:13:50PM +0200, Mike Rapoport wrote: > On Tue, Nov 03, 2020 at 02:08:16PM +0300, Kirill A. Shutemov wrote: > > On Sun, Nov 01, 2020 at 07:08:13PM +0200, Mike Rapoport wrote: > > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > &

Re: [PATCH v3 0/4] arch, mm: improve robustness of direct map manipulation

2020-11-03 Thread Kirill A. Shutemov

arch, mm: make kernel_page_present() always available The series looks good to me (apart from the minor nit): Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [PATCH v3 2/4] PM: hibernate: make direct map manipulations more explicit

2020-11-03 Thread Kirill A. Shutemov

} else { > + debug_pagealloc_map_pages(page, 1, enable); > + } > +} > + > static int swsusp_page_is_free(struct page *); > static void swsusp_set_page_forbidden(struct page *); > static void swsusp_unset_page_forbidden(struct page *); -- Kirill A. Shutemov

Re: [PATCH v11 01/25] mm/gup: factor out duplicate code from four routines

2019-12-18 Thread Kirill A. Shutemov

On Wed, Dec 18, 2019 at 02:15:53PM -0800, John Hubbard wrote: > On 12/18/19 7:52 AM, Kirill A. Shutemov wrote: > > On Mon, Dec 16, 2019 at 02:25:13PM -0800, John Hubbard wrote: > > > +static void put_compound_head(struct page *page, int refs) > > > +{ > > > + /*

Re: [PATCH v11 06/25] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM

2019-12-18 Thread Kirill A. Shutemov

vmas arg is NULL) > + * and return -ENOTSUPP if DAX isn't allowed in this case: > + */ > + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, > + vmas, gup_flags | FOLL_TOUCH | > + FOLL_REMOTE); > + } > > return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, > locked, > -- > 2.24.1 > -- Kirill A. Shutemov

Re: [PATCH v11 04/25] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

2019-12-18 Thread Kirill A. Shutemov

e condition would save you an indentation level. > + int count = page_ref_dec_return(page); > + > + /* > + * devmap page refcounts are 1-based, rather than 0-based: if > + * refcount is 1, then the page is free and the refcount is > + * stable because nobody holds a reference on the page. > + */ > + if (count == 1) > + free_devmap_managed_page(page); > + else if (!count) > + __put_page(page); > + } > + > + return is_devmap; > +} > +EXPORT_SYMBOL(put_devmap_managed_page); > +#endif > -- > 2.24.1 > > -- Kirill A. Shutemov

Re: [PATCH v11 01/25] mm/gup: factor out duplicate code from four routines

2019-12-18 Thread Kirill A. Shutemov

page); > +} It's not terribly efficient. Maybe something like: VM_BUG_ON_PAGE(page_ref_count(page) < ref, page); if (refs > 2) page_ref_sub(page, refs - 1); put_page(page); ? -- Kirill A. Shutemov

Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks

2019-10-08 Thread Kirill A. Shutemov

. Yes we do. MADV_DONTNEED used a lot by userspace memory allocators and it will be very noticible performance regression if we would switch it to down_write(mmap_sem). -- Kirill A. Shutemov

Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Kirill A. Shutemov

On Mon, Oct 07, 2019 at 03:51:58PM +0200, Ingo Molnar wrote: > > * Kirill A. Shutemov wrote: > > > On Mon, Oct 07, 2019 at 03:06:17PM +0200, Ingo Molnar wrote: > > > > > > * Anshuman Khandual wrote: > > > > > > > This adds a t

Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Kirill A. Shutemov

ritten as inline function + define. Something like: #define mm_p4d_folded mm_p4d_folded static inline bool mm_p4d_folded(struct mm_struct *mm) { return !pgtable_l5_enabled(); } But I don't see much reason to be more verbose here than needed. -- Kirill A. Shutemov

Re: [PATCH v4 03/11] mm/gup: Applies counting method to monitor gup_pgd_range

2019-09-30 Thread Kirill A. Shutemov

On Fri, Sep 27, 2019 at 08:40:00PM -0300, Leonardo Bras wrote: > As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to ^ typo -- Kirill A. Shutemov

Re: [PATCH V3 0/2] mm/debug: Add tests for architecture exported page table helpers

2019-09-24 Thread Kirill A. Shutemov

wn) in pud_clear_tests() as there were no available > __pgd() definitions. > > - ARM32 > - IA64 Hm. Grep shows __pgd() definitions for both of them. Is it for specific config? -- Kirill A. Shutemov

Re: [PATCH V2 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-13 Thread Kirill A. Shutemov

it but its just a single line. Kirill suggested this in the > previous version. There is a generic fallback definition but s390 has it's > own. This change overrides the generic one for x86 probably as a fix or as > an improvement. Kirill should be able to help classify it in which case it > can be a separate patch. I don't think it worth a separate patch. -- Kirill A. Shutemov

Re: [PATCH] mm/pgtable/debug: Fix test validating architecture page table helpers

2019-09-13 Thread Kirill A. Shutemov

/highmem_32.c#L34 > >> > >> I have not checked others, but I guess it is like that for all. > >> > > > > > > Seems like I answered too quickly. All kmap_atomic() do preempt_disable(), > > but not all pte_alloc_map() call kmap_atomic(). > > > > However, for instance ARM does: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/arm/include/asm/pgtable.h#L200 > > > > And X86 as well: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/x86/include/asm/pgtable_32.h#L51 > > > > Microblaze also: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/microblaze/include/asm/pgtable.h#L495 > > All the above platforms checks out to be using k[un]map_atomic(). I am > wondering whether > any of the intermediate levels will have similar problems on any these 32 bit > platforms > or any other platforms which might be using generic k[un]map_atomic(). No. Kernel only allocates pte page table from highmem. All other page tables are always visible in kernel address space. -- Kirill A. Shutemov

Re: [PATCH V2 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-12 Thread Kirill A. Shutemov

ll code here __init (or it's variants) so it can be discarded on boot. It has not use after that. -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-09 Thread Kirill A. Shutemov

ry from generic code like this test case is bit tricky. That > >> is because there are not enough helpers to create entries with an absolute > >> value. This would have been easier if all the platforms provided functions > >> like __pxx() which is not the case now. Otherwise something like this > >> should > >> have worked. > >> > >> > >> pud_t pud = READ_ONCE(*pudp); > >> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0)) > >> WRITE_ONCE(*pudp, pud); > >> > >> But __pud() will fail to build in many platforms. > > > > Hmm, I simply used this on my system to make pud_clear_tests() work, not > > sure if it works on all archs: > > > > pud_val(*pudp) |= RANDOM_NZVALUE; > > Which compiles on arm64 but then fails on x86 because of the way pmd_val() > has been defined there. Use instead *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE); It *should* be more portable. -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-05 Thread Kirill A. Shutemov

s(mm, pmdp, (pgtable_t) page); - pud_populate_tests(mm, pudp, pmdp); - p4d_populate_tests(mm, p4dp, pudp); - pgd_populate_tests(mm, pgdp, p4dp); + pud_populate_tests(mm, pudp, saved_pmdp); + p4d_populate_tests(mm, p4dp, saved_pudp); + pgd_populate_tests(mm, pgdp, saved_p4dp); p4d_free(mm, saved_p4dp); pud_free(mm, saved_pudp); -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-04 Thread Kirill A. Shutemov

table > + * entries will be used for testing with random or garbage > + * values. These saved addresses will be used for freeing > + * page table pages. > + */ > + saved_p4dp = p4d_offset(pgdp, 0UL); > + saved_pudp = pud_offset(p4dp, 0UL); > + saved_pmdp = pmd_offset(pudp, 0UL); > + saved_ptep = pte_offset_map(pmdp, 0UL); > + > + pte_basic_tests(page, prot); > + pmd_basic_tests(page, prot); > + pud_basic_tests(page, prot); > + p4d_basic_tests(page, prot); > + pgd_basic_tests(page, prot); > + > + pte_clear_tests(ptep); > + pmd_clear_tests(pmdp); > + pud_clear_tests(pudp); > + p4d_clear_tests(p4dp); > + pgd_clear_tests(pgdp); > + > + pmd_populate_tests(mm, pmdp, (pgtable_t) page); This is not correct for architectures that defines pgtable_t as pte_t pointer, not struct page pointer. > + pud_populate_tests(mm, pudp, pmdp); > + p4d_populate_tests(mm, p4dp, pudp); > + pgd_populate_tests(mm, pgdp, p4dp); This is wrong. All p?dp points to the second entry in page table entry. This is not valid pointer for page table and triggers p?d_bad() on x86. Use saved_p?dp instead. > + > + p4d_free(mm, saved_p4dp); > + pud_free(mm, saved_pudp); > + pmd_free(mm, saved_pmdp); > + pte_free(mm, (pgtable_t) virt_to_page(saved_ptep)); > + > + mm_dec_nr_puds(mm); > + mm_dec_nr_pmds(mm); > + mm_dec_nr_ptes(mm); > + __mmdrop(mm); > + > + free_mapped_page(page); > + return 0; > +} > + > +static void __exit arch_pgtable_tests_exit(void) { } > + > +module_init(arch_pgtable_tests_init); > +module_exit(arch_pgtable_tests_exit); > + > +MODULE_LICENSE("GPL v2"); > +MODULE_AUTHOR("Anshuman Khandual "); > +MODULE_DESCRIPTION("Test archicture page table helpers"); > -- > 2.20.1 > > -- Kirill A. Shutemov

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-19 Thread Kirill A. Shutemov

ges in that case? > > The problem with the transparent_hugepage/enabled interface is that it > conflates performing compaction work to produce THP-pages with the > ability to map huge pages at all. That's not [entirely] true. transparent_hugepage/defrag gates heavy-duty compaction. We do only very limited compaction if it's not advised by transparent_hugepage/defrag. I believe DAX has to respect transparent_hugepage/enabled. Or not advertise its huge pages as THP. It's confusing for user. -- Kirill A. Shutemov

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-06 Thread Kirill A. Shutemov

ocated out of /dev/dax/ or > /dev/pmem*. Do we have a reason not to use hugepages for mapping pages in > that case? Yes. Like when you don't want dax to compete for TLB with mission-critical application (which uses hugetlb for instance). -- Kirill A. Shutemov

Re: [PATCH] mmap.2: describe the 5level paging hack

2019-02-12 Thread Kirill A. Shutemov

address space. It probably worth recommending (void *) -1 as such address. > .\" Before Linux 2.6.24, the address was rounded up to the next page > .\" boundary; since 2.6.24, it is rounded down! > The address of the new mapping is returned as the result of the call. > -- > 2.20.1.791.gb4d0f1c61a-goog > -- Kirill A. Shutemov

Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

2018-12-10 Thread Kirill A. Shutemov

efine OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) Have you tested it with CONFIG_X86_5LEVEL=y? ASAICS, the patch makes OBJ_INDEX_BITS and what depends from it dynamic -- it depends what paging mode we are booting in. ZS_SIZE_CLASSES depends indirectly on OBJ_INDEX_BITS and I don't see how struct zs_pool definition can compile with dynamic ZS_SIZE_CLASSES. Hm? -- Kirill A. Shutemov

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-25 Thread Kirill A. Shutemov

On Wed, Oct 24, 2018 at 07:09:07PM -0700, Joel Fernandes wrote: > On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: >

Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)

2018-10-25 Thread Kirill A. Shutemov

d the address for coloring. It's not needed anymore. Page allocator and SL?B are good enough now. See 3c936465249f ("[SPARC64]: Kill pgtable quicklists and use SLAB.") -- Kirill A. Shutemov

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-24 Thread Kirill A. Shutemov

On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > index

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-24 Thread Kirill A. Shutemov

/* Set the new pmd */ > + set_pmd_at(mm, new_addr, new_pmd, pmd); > + if (new_ptl != old_ptl) > + spin_unlock(new_ptl); > + spin_unlock(old_ptl); > + > + *need_flush = true; > + return true; > + } > + return false; > +} > + -- Kirill A. Shutemov

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote: > > On 10/12/18 3:48 PM, Anton Ivanov wrote: > > On 12/10/2018 15:37, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote: > > > > On 10/12/18 2:37

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

On Fri, Oct 12, 2018 at 09:57:19AM -0700, Joel Fernandes wrote: > On Fri, Oct 12, 2018 at 04:19:46PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 12, 2018 at 05:50:46AM -0700, Joel Fernandes wrote: > > > On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: >

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

+ > > + /* Set the new pmd */ > > + set_pmd_at(mm, new_addr, new_pmd, pmd); > > UML does not have set_pmd_at at all Every architecture does. :) But it may come not from the arch code. > If I read the code right, MIPS completely ignores the address argument so > set_pmd_at there may not have the effect which this patch is trying to > achieve. Ignoring address is fine. Most architectures do that.. The ideas is to move page table to the new pmd slot. It's nothing to do with the address passed to set_pmd_at(). -- Kirill A. Shutemov

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

On Fri, Oct 12, 2018 at 05:50:46AM -0700, Joel Fernandes wrote: > On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: > > On Thu, Oct 11, 2018 at 06:37:56PM -0700, Joel Fernandes (Google) wrote: > > > Android needs to mremap large regions of memory during

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: > On Thu, Oct 11, 2018 at 06:37:56PM -0700, Joel Fernandes (Google) wrote: > > @@ -239,7 +287,21 @@ unsigned long move_page_tables(struct vm_area_struct > > *vma, > > split_huge_pmd(v

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov

continue; > + } else if (extent == PMD_SIZE) { Hm. What guarantees that new_addr is PMD_SIZE-aligned? It's not obvious to me. -- Kirill A. Shutemov

Re: [PATCH v2 1/2] treewide: remove unused address argument from pte_alloc functions

2018-10-12 Thread Kirill A. Shutemov

pte_quicklist = (unsigned long *)(*ret); > - ret[0] = 0; > - pgtable_cache_size--; > - } > - return (pte_t *)ret; > -} > - Ditto. -- Kirill A. Shutemov

Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5

2018-06-29 Thread Kirill A. Shutemov

ss) { tlb_flush_pgtable(tlb, address); - pgtable_page_dtor(table); pgtable_free_tlb(tlb, page_address(table), 0); } #endif /* _ASM_POWERPC_PGALLOC_32_H */ -- Kirill A. Shutemov

Re: [PATCH] selftests/vm: Update max va test to check for high address return.

2018-02-28 Thread Kirill A. Shutemov

ts any address, not restricted to 47-bit address space. It doesn't mean the application *require* the address to be above 47-bit. At least on x86, -1 just shift upper boundary of address range where we can look for unmapped area. -- Kirill A. Shutemov

Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Kirill A. Shutemov

gt; > > > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) Any chance to bisect it? Could you check if the commit just before 83e3c48729d9 is fine? -- Kirill A. Shutemov

Re: [PATCH v6 00/24] Speculative page faults

2018-01-16 Thread Kirill A. Shutemov

t case scenario? Like when we go far enough into speculative code path on every page fault and then fallback to normal page fault? -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-08 Thread Kirill A. Shutemov

n't switch to large address space if hint_addr + len > 128TB. > > The decision to switch to large address space is primarily based on hint > > addr > > But does the mmap succeed in that case or not? > > ie: mmap(0x7000, 0x2000, ...) = ? It does, but resulting address doesn't match the hint. It's somewhere below 47-bit border. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

t; 2) For everything else we search in < 128TB space if hint addr is below > 128TB > > 3) We don't switch to large address space if hint_addr + len > 128TB. The > decision to switch to large address space is primarily based on hint addr > > Is there any other rule we need to outline? Or is any of the above not > correct? That's correct. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

On Tue, Nov 07, 2017 at 02:05:42PM +0100, Florian Weimer wrote: > On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote: > > On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote: > > > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote: > > > > > > > >

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

ttempts MAP_FIXED allocation > of addr + len above 128TB might use high bits of pointer returned by > that library because those are never satisfied today and the library > would fall back. If you want to point that it's ABI break, yes it is. But we allow ABI break as long as nobody notices. I think it's reasonable to expect that nobody relies on such corner cases. If we would find any piece of software affect by the change we would need to reconsider. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote: > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote: > > > > First of all, using addr and MAP_FIXED to develop our heuristic can > > > never really give unchanged ABI. It's an in-band signal. brk()

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

l see > out-of-range addresses, but I expected a full opt-out based on RLIMIT_AS > would be sufficient for them. Just use mmap(-1), without MAP_FIXED to get full address space. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov

et > our user virtual address bits on a fine grained basis. Maybe a > sysctl, maybe a personality. Something out-of-band. I don't wan to > get too far into that discussion yet. First we need to agree whether > or not the code in the tree today is a problem. Well, we've discussed before all options you are proposing. Linus wanted a minimalistic interface, so we took this path for now. We can always add more ways to get access to full address space later. -- Kirill A. Shutemov

Re: [PATCH] mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk().

2017-10-03 Thread Kirill A. Shutemov

p: enable thp migration in generic path") > Reported-and-tested-by: Abdul Haleem > Signed-off-by: Zi Yan > Cc: "Kirill A. Shutemov" > Cc: Anshuman Khandual > Cc: Andrew Morton Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [PATCH v2 14/20] mm: Provide speculative fault infrastructure

2017-08-26 Thread Kirill A. Shutemov

/* > + * We need to re-validate the VMA after checking the bounds, otherwise > + * we might have a false positive on the bounds. > + */ > + if (read_seqcount_retry(&vma->vm_sequence, seq)) > + goto unlock; > + > + ret = handle_pte_fault(&vmf); > + > +unlock: > + srcu_read_unlock(&vma_srcu, idx); > + return ret; > + > +out_walk: > + local_irq_enable(); > + goto unlock; > +} > +#endif /* __HAVE_ARCH_CALL_SPF */ > + > /* > * By the time we get here, we already hold the mm semaphore > * > -- > 2.7.4 > -- Kirill A. Shutemov

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-10 Thread Kirill A. Shutemov

On Thu, Aug 10, 2017 at 10:27:50AM +0200, Laurent Dufour wrote: > On 10/08/2017 02:58, Kirill A. Shutemov wrote: > > On Wed, Aug 09, 2017 at 12:43:33PM +0200, Laurent Dufour wrote: > >> On 09/08/2017 12:12, Kirill A. Shutemov wrote: > >>> On Tue, Aug 08, 2017 at 04

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-09 Thread Kirill A. Shutemov

On Wed, Aug 09, 2017 at 12:43:33PM +0200, Laurent Dufour wrote: > On 09/08/2017 12:12, Kirill A. Shutemov wrote: > > On Tue, Aug 08, 2017 at 04:35:38PM +0200, Laurent Dufour wrote: > >> The VMA sequence count has been introduced to allow fast detection of > >> VMA modif

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-09 Thread Kirill A. Shutemov

x27;s anywhere near complete list of places where we touch vm_flags. What is your plan for the rest? -- Kirill A. Shutemov

Re: [PATCH 02/16] mm: Prepare for FAULT_FLAG_SPECULATIVE

2017-08-09 Thread Kirill A. Shutemov

pte, vmf->orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { -- Kirill A. Shutemov

Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value

2017-07-27 Thread Kirill A. Shutemov

tp://lkml.kernel.org/r/20170615145224.66200-1-kirill.shute...@linux.intel.com -- Kirill A. Shutemov

Re: 5-level pagetable patches break ppc64le

2017-03-13 Thread Kirill A. Shutemov

ven't had a chance to narrow it down yet. Please check if patch by this link helps: http://lkml.kernel.org/r/20170313052213.11411-1-kirill.shute...@linux.intel.com -- Kirill A. Shutemov

Re: [PATCH] mm: stop leaking PageTables

2017-01-08 Thread Kirill A. Shutemov

Residue of an earlier implementation, perhaps? Delete it. > > Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") > Signed-off-by: Hugh Dickins Sorry, that I missed this initially. Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [RFC 1/4] mm: remove unused TASK_SIZE_OF()

2017-01-02 Thread Kirill A. Shutemov

..@vger.kernel.org > Cc: sparcli...@vger.kernel.org > Signed-off-by: Dmitry Safonov I've noticed this too. Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [PATCH V3 2/2] mm: THP page cache support for ppc64

2016-11-13 Thread Kirill A. Shutemov

hp_split_page 51518 > thp_split_page_failed 1 > thp_deferred_split_page 73566 > thp_split_pmd 665 > thp_zero_page_alloc 3 > thp_zero_page_alloc_failed 0 > > Signed-off-by: Aneesh Kumar K.V One nit-pick below, but otherwise Acked-by: Kirill A. Shutemov > @@ -2975,6 +3004,1

Re: [PATCH 2/2] mm: THP page cache support for ppc64

2016-11-11 Thread Kirill A. Shutemov

On Fri, Nov 11, 2016 at 05:42:11PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" writes: > > > On Mon, Nov 07, 2016 at 02:04:41PM +0530, Aneesh Kumar K.V wrote: > >> @@ -2953,6 +2966,13 @@ static int do_set_pmd(struct fault_env *fe, struc

Re: [PATCH 2/2] mm: THP page cache support for ppc64

2016-11-11 Thread Kirill A. Shutemov

MEM handling? I think we should do this way before this point. Maybe in do_fault() or something. -- Kirill A. Shutemov

Re: [PATCH V2] mm: move vma_is_anonymous check within pmd_move_must_withdraw

2016-11-11 Thread Kirill A. Shutemov

ned-off-by: Aneesh Kumar K.V Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [RFC PATCH] powerpc/mm: THP page cache support

2016-09-26 Thread Kirill A. Shutemov

long addr) > { > @@ -1359,6 +1367,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct > vm_area_struct *vma, > atomic_long_dec(&tlb->mm->nr_ptes); > add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); > } else { > + if (arch_needs_pgtable_deposit()) Just hide the arch_needs_pgtable_deposit() check in zap_deposited_table(). > + zap_deposited_table(tlb->mm, pmd); > add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR); > } > spin_unlock(ptl); -- Kirill A. Shutemov

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-25 Thread Kirill A. Shutemov

ld Schaefer wrote: > >> On Tue, 23 Feb 2016 13:32:21 +0300 > >> "Kirill A. Shutemov" wrote: > >> > The theory is that the splitting bit effetely masked bogus pmd_present(): > >> > we had pmd_trans_splitting() in all code path and that prevented mm

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-23 Thread Kirill A. Shutemov

AGE_SIZE) { + for (i = 0; i < HPAGE_PMD_NR; i++) { page_remove_rmap(page + i, false); put_page(page + i); } -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-d

Re: Question on follow_page_mask

2016-02-23 Thread Kirill A. Shutemov

is the purpose behind the BUG_ON. I would guess requesting pin on non-reclaimable page is considered useless, meaning suspicius behavior. BUG_ON() is overkill, I think. WARN_ON_ONCE() would make it. Not that this follow_huge_addr() on Power is not reachable via do_move_page_to_node_array(), becaus

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-23 Thread Kirill A. Shutemov

00644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud) static inline int pmd_present(pmd_t pmd) { - return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID; + return !(pmd_val(pmd) & _SEGMEN

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-18 Thread Kirill A. Shutemov

On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote: > On Thu, 18 Feb 2016 01:58:08 +0200 > "Kirill A. Shutemov" wrote: > > > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote: > > > On Sat, 13 Feb 2016 12:58:31 +010

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-17 Thread Kirill A. Shutemov

memory.c that check the same? > > This behavior is not new, it was the same before the THP rework, so I do not > assume that it is related to the current problems, maybe with the exception > of this specific crash. I never saw the BUG at mm/huge_

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-17 Thread Kirill A. Shutemov

On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote: > On Mon, 15 Feb 2016 23:35:26 +0200 > "Kirill A. Shutemov" wrote: > > > Is there any chance that I'll be able to trigger the bug using QEMU? > > Does anybody have an QEMU image I can use? >

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-15 Thread Kirill A. Shutemov

On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote: > On Mon, 15 Feb 2016 13:31:59 +0200 > "Kirill A. Shutemov" wrote: > > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: > > > > > > On Sat, 13 Feb 2016, Kirill A. Shutem

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-15 Thread Kirill A. Shutemov

On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote: > > Could you check if revert of fecffad25458 helps? > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with: > > ¢ 1851.721062! Unable

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-12 Thread Kirill A. Shutemov

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote: > On Fri, 12 Feb 2016 16:57:27 +0100 > Christian Borntraeger wrote: > > > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote: > > > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote: > > &

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-12 Thread Kirill A. Shutemov

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote: > On Thu, 11 Feb 2016 21:09:42 +0200 > "Kirill A. Shutemov" wrote: > > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote: > > > Hi, > > > > > > Sebastian Ott re

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-11 Thread Kirill A. Shutemov

On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote: > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote: > > Hi, > > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and > > he also bisected this to commit 61f5d698 &

1 2 >

1 - 100 of 159 matches

Mail list logo