Hi all (especially mm people!),
On 14/10/2024 11:58, Ryan Roberts wrote:
> arm64 can support multiple base page sizes. Instead of selecting a page
> size at compile time, as is done today, we will make it possible to
> select the desired page size on the command line.
>
&g
udonym for "maintainer" so you were
missed off the original post. Appologies!
More context in cover letter:
https://lore.kernel.org/all/20241014105514.3206191-1-ryan.robe...@arm.com/
On 14/10/2024 11:58, Ryan Roberts wrote:
> arm64 can support multiple base page sizes. Instead of sel
On 15/10/2024 04:04, Pingfan Liu wrote:
> On Mon, Oct 14, 2024 at 10:07 PM Ryan Roberts wrote:
>>
>> On 14/10/2024 14:54, Pingfan Liu wrote:
>>> Hello Ryan,
>>>
>>> On Mon, Oct 14, 2024 at 11:58:08AM +0100, Ryan Roberts wrote:
>>>> arm64 can
On 14/10/2024 14:54, Pingfan Liu wrote:
> Hello Ryan,
>
> On Mon, Oct 14, 2024 at 11:58:08AM +0100, Ryan Roberts wrote:
>> arm64 can support multiple base page sizes. Instead of selecting a page
>> size at compile time, as is done today, we will make it possible to
>> s
t for the non-const
case. Or #if/#else/#endif within a function can be converted to c
if/else blocks, which are also dead code stripped for the const case.
Sometimes we can change the c-preprocessor logic to use the
appropriate MIN/MAX limit.
Signed-off-by: Ryan Roberts
---
***NOTE***
Any
On 25/06/2024 15:06, Matthew Wilcox wrote:
> On Tue, Jun 25, 2024 at 02:41:18PM +0100, Ryan Roberts wrote:
>> On 25/06/2024 14:06, Matthew Wilcox wrote:
>>> On Tue, Jun 25, 2024 at 01:41:02PM +0100, Ryan Roberts wrote:
>>>> On 25/06/2024 13:37, Baolin Wang wrote:
&g
On 25/06/2024 14:06, Matthew Wilcox wrote:
> On Tue, Jun 25, 2024 at 01:41:02PM +0100, Ryan Roberts wrote:
>> On 25/06/2024 13:37, Baolin Wang wrote:
>>
>> [...]
>>
>>>>> For other filesystems, like ext4, I did not found the logic to determin
>>>
On 25/06/2024 13:37, Baolin Wang wrote:
[...]
>>> For other filesystems, like ext4, I did not found the logic to determin what
>>> size of folio to allocate in writable mmap() path
>>
>> Yes I'd be keen to understand this to. When I was doing contpte, page cache
>> would only allocate large folio
On 25/06/2024 08:23, Baolin Wang wrote:
>
>
> On 2024/6/25 11:16, Kefeng Wang wrote:
>>
>>
>> On 2024/6/24 23:56, Ryan Roberts wrote:
>>> + Baolin Wang and Yin Fengwei, who maybe able to help with this.
>>>
>>>
>>> Hi Kefeng,
>>
90413.git.baolin.w...@linux.alibaba.com/
[2]
https://lore.kernel.org/linux-mm/13939ade-a99a-4075-8a26-9be7576b7...@arm.com/
> nr_pages = 1;
> } else if (nr_pages > 1) {
> pgoff_t idx = folio_page_idx(folio, page);
>
>
> On 2024
On 02/04/2024 17:20, Peter Xu wrote:
> On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
>> On 02.04.24 16:48, Ryan Roberts wrote:
>>> Hi Peter,
>
> Hey, Ryan,
>
> Thanks for the report!
>
>>>
>>> On 27/03/2024 15:23, pet...@r
On 02/04/2024 17:00, Matthew Wilcox wrote:
> On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
>>> The oops trigger is at mm/gup.c:778:
>>> VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>>
>>> So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm
Hi Peter,
On 27/03/2024 15:23, pet...@redhat.com wrote:
> From: Peter Xu
>
> Now follow_page() is ready to handle hugetlb pages in whatever form, and
> over all architectures. Switch to the generic code path.
>
> Time to retire hugetlb_follow_page_mask(), following the previous
> retirement of
>
> Some of them look like mm-unstable issue, For example, arm64 fails with
>
> CC arch/arm64/mm/extable.o
> In file included from ./include/linux/hugetlb.h:828,
> from security/commoncap.c:19:
> ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of
> 'arch_clea
On 19/02/2024 15:18, Catalin Marinas wrote:
> On Fri, Feb 16, 2024 at 12:53:43PM +0000, Ryan Roberts wrote:
>> On 16/02/2024 12:25, Catalin Marinas wrote:
>>> On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote:
>>>> +pte_t contpte_ptep
On 16/02/2024 19:54, John Hubbard wrote:
> On 2/16/24 08:56, Catalin Marinas wrote:
> ...
>>> The problem is that the contpte_* symbols are called from the ptep_* inline
>>> functions. So where those inlines are called from modules, we need to make
>>> sure
>>> the contpte_* symbols are available.
Hi Catalin,
Thanks for the review! Comments below...
On 16/02/2024 12:25, Catalin Marinas wrote:
> On Thu, Feb 15, 2024 at 10:31:59AM +0000, Ryan Roberts wrote:
>> arch/arm64/mm/contpte.c | 285 +++
>
> Nitpick: I think most symbols in c
ssued.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 67
arch/arm64/mm/contpte.c | 17
2 files changed, 84 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta
behaviour when 'Misprogramming the Contiguous bit'. See
section D21194 at https://developer.arm.com/documentation/102105/ja-07/
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 61 ++--
arch/arm64/mm/contpte.c | 3
sheuvel
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/pgtable.h | 167 ++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/contpte.c | 285 +++
include/linux/efi
let's convert it to directly call ptep_get_and_clear().
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 27f6160890d1..48e8b4298
if needed.
Reviewed-by: David Hildenbrand
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/tlbflush.h | 13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h
b/arch/arm64/include/asm/tlbflush.h
index 1d
.
The following APIs are treated this way:
- ptep_get
- set_pte
- set_ptes
- pte_clear
- ptep_get_and_clear
- ptep_test_and_clear_young
- ptep_clear_flush_young
- ptep_set_wrprotect
- ptep_set_access_flags
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm
contptes.
Acked-by: David Hildenbrand
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 21 +
mm/memory.c | 19 ---
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/linux/pgtable.h b/include
with order-0 folios (the common case).
Acked-by: Mark Rutland
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d759a20d2929
_at() -> set_ptes(nr=1)) and only
when we are setting the final PTE in a contpte-aligned block.
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 26 +
arch/arm64/mm/contpte.c | 64
2 files changed, 90 insertions(+)
diff --git
their
iterators to skip getting the contpte tail ptes when gathering the batch
of ptes to operate on. This results in the number of PTE reads returning
to 1 per pte.
Acked-by: Mark Rutland
Reviewed-by: David Hildenbrand
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include
() rather
than the arch-private __set_ptes().
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 2 +-
arch/arm64/kernel/mte.c | 2 +-
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/mm/fault.c| 2 +-
arch/arm64/mm/hugetlbpage.c
support. In this case, ptep_get() will become more complex so we now
have all the code abstracted through it.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 12 +---
arch/arm64/kernel/efi.c | 2 +-
arch/arm64/mm/fault.c| 4
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
override the new pte_advance_pfn() API to do so.
Signed-off-by: Ryan Roberts
---
arch/x86/include/asm/pgtable.h | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b
Now that the all architecture overrides of pte_next_pfn() have been
replaced with pte_advance_pfn(), we can simplify the definition of the
generic pte_next_pfn() macro so that it is unconditionally defined.
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 2 --
1 file changed, 2
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
override the new pte_advance_pfn() API to do so.
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b
overriding architecture's
pte_next_pfn() to pte_advance_pfn().
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 231370e1b80f..b7ac8358f2aa 100644
--- a/in
set as a batch, the contpte blocks
can be initially set up pre-folded (once the arm64 contpte support is
added in the next few patches). This leads to noticeable performance
improvement during split.
Acked-by: David Hildenbrand
Signed-off-by: Ryan Roberts
---
mm/huge_memory.c | 58
lore.kernel.org/linux-mm/633af0a7-0823-424f-b6ef-374d99483...@arm.com/
[6] https://lore.kernel.org/lkml/08c16f7d-f3b3-4f22-9acc-da943f647...@arm.com/
[7] https://lore.kernel.org/linux-mm/20240214204435.167852-1-da...@redhat.com/
[8]
https://lore.kernel.org/linux-mm/c507308d-bdd4-5f9e-d4ff-e96e4520b.
s must initially be not-present. All
set_ptes() callsites already conform to this requirement. Stating it
explicitly is useful because it allows for a simplification to the
upcoming arm64 contpte implementation.
Acked-by: David Hildenbrand
Signed-off-by: Ryan Roberts
---
include/linux/pgtable
On 13/02/2024 17:44, Mark Rutland wrote:
> On Fri, Feb 02, 2024 at 08:07:56AM +0000, Ryan Roberts wrote:
>> There are situations where a change to a single PTE could cause the
>> contpte block in which it resides to become foldable (i.e. could be
>> repainted with the
On 13/02/2024 16:43, Mark Rutland wrote:
> On Fri, Feb 02, 2024 at 08:07:52AM +0000, Ryan Roberts wrote:
>> Optimize the contpte implementation to fix some of the
>> exit/munmap/dontneed performance regression introduced by the initial
>> contpte commit. Subsequent patches w
On 13/02/2024 16:31, Mark Rutland wrote:
> On Fri, Feb 02, 2024 at 08:07:51AM +0000, Ryan Roberts wrote:
>> Optimize the contpte implementation to fix some of the fork performance
>> regression introduced by the initial contpte commit. Subsequent patches
>> will solve it e
On 12/02/2024 16:24, David Hildenbrand wrote:
> On 12.02.24 16:34, Ryan Roberts wrote:
>> On 12/02/2024 15:26, David Hildenbrand wrote:
>>> On 12.02.24 15:45, Ryan Roberts wrote:
>>>> On 12/02/2024 13:54, David Hildenbrand wrote:
>>>>>>> If
On 13/02/2024 14:08, Ard Biesheuvel wrote:
> On Tue, 13 Feb 2024 at 15:05, David Hildenbrand wrote:
>>
>> On 13.02.24 15:02, Ryan Roberts wrote:
>>> On 13/02/2024 13:45, David Hildenbrand wrote:
>>>> On 13.02.24 14:33, Ard Biesheuvel wrote:
>>>>>
On 13/02/2024 13:45, David Hildenbrand wrote:
> On 13.02.24 14:33, Ard Biesheuvel wrote:
>> On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote:
>>>
>>> On 13/02/2024 13:13, David Hildenbrand wrote:
>>>> On 13.02.24 14:06, Ryan Roberts wrote:
>>>
On 13/02/2024 13:22, David Hildenbrand wrote:
> On 13.02.24 14:20, Ryan Roberts wrote:
>> On 13/02/2024 13:13, David Hildenbrand wrote:
>>> On 13.02.24 14:06, Ryan Roberts wrote:
>>>> On 13/02/2024 12:19, David Hildenbrand wrote:
>>>>> On 13.02.24 13:06
On 13/02/2024 13:13, David Hildenbrand wrote:
> On 13.02.24 14:06, Ryan Roberts wrote:
>> On 13/02/2024 12:19, David Hildenbrand wrote:
>>> On 13.02.24 13:06, Ryan Roberts wrote:
>>>> On 12/02/2024 20:38, Ryan Roberts wrote:
>>>>> [...]
>>
On 13/02/2024 12:19, David Hildenbrand wrote:
> On 13.02.24 13:06, Ryan Roberts wrote:
>> On 12/02/2024 20:38, Ryan Roberts wrote:
>>> [...]
>>>
>>>>>>> +static inline bool mm_is_user(struct mm_struct *mm)
>>>>>>> +{
>>&g
On 13/02/2024 12:02, Mark Rutland wrote:
> On Mon, Feb 12, 2024 at 12:59:57PM +0000, Ryan Roberts wrote:
>> On 12/02/2024 12:00, Mark Rutland wrote:
>>> Hi Ryan,
>
> [...]
>
>>>> +static inline void set_pte(pte_t *ptep, pte_t pte)
>>>> +{
>
On 12/02/2024 20:38, Ryan Roberts wrote:
> [...]
>
>>>>> +static inline bool mm_is_user(struct mm_struct *mm)
>>>>> +{
>>>>> + /*
>>>>> + * Don't attempt to apply the contig bit to kernel mappings, because
>
On 12/02/2024 14:29, David Hildenbrand wrote:
> On 12.02.24 15:10, Ryan Roberts wrote:
>> On 12/02/2024 12:14, David Hildenbrand wrote:
>>> On 02.02.24 09:07, Ryan Roberts wrote:
>>>> The goal is to be able to advance a PTE by an arbitrary number of PFNs.
>>&g
[...]
+static inline bool mm_is_user(struct mm_struct *mm)
+{
+ /*
+ * Don't attempt to apply the contig bit to kernel mappings, because
+ * dynamically adding/removing the contig bit can cause page faults.
+ * These racing faults are ok for user space, since t
On 12/02/2024 13:43, David Hildenbrand wrote:
> On 02.02.24 09:07, Ryan Roberts wrote:
>> Some architectures (e.g. arm64) can tell from looking at a pte, if some
>> follow-on ptes also map contiguous physical memory with the same pgprot.
>> (for arm64, these are contpte
On 12/02/2024 15:26, David Hildenbrand wrote:
> On 12.02.24 15:45, Ryan Roberts wrote:
>> On 12/02/2024 13:54, David Hildenbrand wrote:
>>>>> If so, I wonder if we could instead do that comparison modulo the
>>>>> access/dirty
>>>>> bits,
>
On 12/02/2024 12:59, Ryan Roberts wrote:
> On 12/02/2024 12:00, Mark Rutland wrote:
>> Hi Ryan,
>>
>> Overall this looks pretty good; I have a bunch of minor comments below, and a
>> bigger question on the way ptep_get_lockless() works.
>
> OK great - thanks f
On 12/02/2024 13:43, David Hildenbrand wrote:
> On 02.02.24 09:07, Ryan Roberts wrote:
>> Some architectures (e.g. arm64) can tell from looking at a pte, if some
>> follow-on ptes also map contiguous physical memory with the same pgprot.
>> (for arm64, these are contpte
On 12/02/2024 13:54, David Hildenbrand wrote:
>>> If so, I wonder if we could instead do that comparison modulo the
>>> access/dirty
>>> bits,
>>
>> I think that would work - but will need to think a bit more on it.
>>
>>> and leave ptep_get_lockless() only reading a single entry?
>>
>> I think we
On 12/02/2024 12:14, David Hildenbrand wrote:
> On 02.02.24 09:07, Ryan Roberts wrote:
>> The goal is to be able to advance a PTE by an arbitrary number of PFNs.
>> So introduce a new API that takes a nr param.
>>
>> We are going to remove pte_next_pfn() and replace
On 12/02/2024 13:15, David Hildenbrand wrote:
> On 12.02.24 14:05, Ryan Roberts wrote:
>> On 12/02/2024 12:44, David Hildenbrand wrote:
>>> On 02.02.24 09:07, Ryan Roberts wrote:
>>>> Split __flush_tlb_range() into __flush_tlb_range_nosync() +
>>>> _
On 12/02/2024 12:44, David Hildenbrand wrote:
> On 02.02.24 09:07, Ryan Roberts wrote:
>> Split __flush_tlb_range() into __flush_tlb_range_nosync() +
>> __flush_tlb_range(), in the same way as the existing flush_tlb_page()
>> arrangement. This allows calling __flush_tlb_range_
Fri, Feb 02, 2024 at 08:07:50AM +, Ryan Roberts wrote:
>> With the ptep API sufficiently refactored, we can now introduce a new
>> "contpte" API layer, which transparently manages the PTE_CONT bit for
>> user mappings.
>>
>> In this initial implementation, o
On 12/02/2024 11:05, David Hildenbrand wrote:
> On 12.02.24 11:56, David Hildenbrand wrote:
>> On 12.02.24 11:32, Ryan Roberts wrote:
>>> On 12/02/2024 10:11, David Hildenbrand wrote:
>>>> Hi Ryan,
>>>>
>>>>>> -static void tlb_b
On 12/02/2024 10:11, David Hildenbrand wrote:
> Hi Ryan,
>
>>> -static void tlb_batch_pages_flush(struct mmu_gather *tlb)
>>> +static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)
>>> {
>>> - struct mmu_gather_batch *batch;
>>> -
>>> - for (batch = &tlb->local; batch
w. If we ever have a cheap
> folio_mapcount(), we might just want to check for underflows there.
>
> To keep small folios as fast as possible force inlining of a specialized
> variant using __always_inline with nr=1.
>
> Signed-off-by: David Hildenbrand
Reviewed-by: Ryan Roberts
On 09/02/2024 22:15, David Hildenbrand wrote:
> It's a pain that we have to handle cond_resched() in
> tlb_batch_pages_flush() manually and cannot simply handle it in
> release_pages() -- release_pages() can be called from atomic context.
> Well, in a perfect world we wouldn't have to make our code
l __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
> - bool delay_rmap, int page_size)
> +static bool __tlb_remove_folio_pages_size(struct mmu_gather *tlb,
> + struct page *page, unsigned int nr_pages, bool delay_rmap,
> + int page_siz
On 09/02/2024 22:15, David Hildenbrand wrote:
> Let's prepare for further changes by factoring out processing of present
> PTEs.
>
> Signed-off-by: David Hildenbrand
Reviewed-by: Ryan Roberts
> ---
> mm/memory.c | 94 ++--
On 09/02/2024 22:16, David Hildenbrand wrote:
>>> 1) Convert READ_ONCE() -> ptep_get()
>>> 2) Convert set_pte_at() -> set_ptes()
>>> 3) All the "New layer" renames and addition of the trivial wrappers
>>
>> Yep that makes sense. I'll start prepping that today. I'll hold off reposting
>> until I hav
On 08/02/2024 17:34, Mark Rutland wrote:
> On Fri, Feb 02, 2024 at 08:07:31AM +0000, Ryan Roberts wrote:
>> Hi All,
>
> Hi Ryan,
>
> I assume this is the same as your 'features/granule_perf/contpte-lkml_v'
> branch
> on https://gitlab.arm.com/linux-arm/
_at() -> set_ptes(nr=1)) and only
when we are setting the final PTE in a contpte-aligned block.
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 26 +
arch/arm64/mm/contpte.c | 64
2 files changed, 90 insertions(+)
diff --git
with order-0 folios (the common case).
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 353ea67b5d75..cdc310880a3b 100644
--- a
their
iterators to skip getting the contpte tail ptes when gathering the batch
of ptes to operate on. This results in the number of PTE reads returning
to 1 per pte.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 9 +
1 file changed, 9 insertions
contptes.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 18 ++
mm/memory.c | 20 +---
2 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index
ssued.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 67
arch/arm64/mm/contpte.c | 17
2 files changed, 84 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta
behaviour when 'Misprogramming the Contiguous bit'. See
section D21194 at https://developer.arm.com/documentation/102105/latest/
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 61 ++--
arch/arm64/mm/contpte.c |
the young bit from a contiguous range of ptes.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/tlbflush.h | 13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h
b/arch/arm64/include/asm/tlbflu
lts to enabled as long as its dependency,
TRANSPARENT_HUGEPAGE is also enabled. The core-mm depends upon
TRANSPARENT_HUGEPAGE to be able to allocate large folios, so if its not
enabled, then there is no chance of meeting the physical contiguity
requirement for contpte mappings.
Tested-by: J
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 77a8b100e1cd..2870bc12f288 100644
--- a/arch/arm64/include
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 18 +++---
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5f560326116e..77a8b100e1cd 100644
--- a/arch
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 5 +++--
arch/arm64/mm/hugetlbpage.c | 6 +++---
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 3b0ff58109c5
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 3 ++-
arch/arm64/mm/fixmap.c | 2 +-
arch/arm64/mm/hugetlbpage.c | 2 +-
arch/arm64/mm/mmu.c | 2 +-
4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/arm64
managing their own loop. This is left for future
improvement.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 10 +-
arch/arm64/kernel/mte.c | 2 +-
arch/arm64/kvm/guest.c | 2 +-
arch/arm64/mm/fault.c| 2 +-
arch
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 11 +++
arch/arm64/kernel/efi.c | 2 +-
arch/arm64/mm/fixmap.c | 2 +-
arch/arm64/mm/kasan_init.c | 4 ++--
arch/arm64/mm/mmu.c | 2 +-
arch/arm64/mm
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 10 ++
arch/arm64/mm/fault.c| 6 +++---
arch/arm64/mm/hugetlbpage.c | 2 +-
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch
.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 10 ++
arch/arm64/mm/hugetlbpage.c | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2870bc12f288
() so convert those to the private API. While
other callsites were doing direct READ_ONCE(), so convert those to use
the appropriate (public/private) API too.
Tested-by: John Hubbard
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 12 +---
arch/arm64/kernel/efi.c
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
improve the API to do so and change the name.
Signed-off-by: Ryan Roberts
---
arch/x86/include/asm/pgtable.h | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
improve the API to do so and change the name.
Signed-off-by: Ryan Roberts
---
arch/arm64/include/asm/pgtable.h | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
improve the API to do so and change the name.
Signed-off-by: Ryan Roberts
---
arch/powerpc/mm/pgtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm
Now that the architectures are converted over to pte_advance_pfn(), we
can remove the pte_next_pfn() wrapper and convert the callers to call
pte_advance_pfn().
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 9 +
mm/memory.c | 4 ++--
2 files changed, 3 insertions
Core-mm needs to be able to advance the pfn by an arbitrary amount, so
improve the API to do so and change the name.
Signed-off-by: Ryan Roberts
---
arch/arm/mm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index c24e29c0b9a4
incrementally switch the
architectures over. Once all arches are moved over, we will change all
the core-mm callers to call pte_advance_pfn() directly and remove the
wrapper.
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git
set as a batch, the contpte blocks
can be initially set up pre-folded (once the arm64 contpte support is
added in the next few patches). This leads to noticeable performance
improvement during split.
Acked-by: David Hildenbrand
Signed-off-by: Ryan Roberts
---
mm/huge_memory.c | 58
s must initially be not-present. All
set_ptes() callsites already conform to this requirement. Stating it
explicitly is useful because it allows for a simplification to the
upcoming arm64 contpte implementation.
Signed-off-by: Ryan Roberts
---
include/linux/pgtable.h | 4
1 file chan
.com/
[5] https://lore.kernel.org/lkml/08c16f7d-f3b3-4f22-9acc-da943f647...@arm.com/
[6] https://lore.kernel.org/lkml/20240129124649.189745-1-da...@redhat.com/
[7] https://lore.kernel.org/lkml/20240129143221.263763-1-da...@redhat.com/
[8]
https://lore.kernel.org/linux-mm/c507308d-bdd4-5f9e-d4ff-e96e4520b.
On 31/01/2024 15:05, David Hildenbrand wrote:
> On 31.01.24 16:02, Ryan Roberts wrote:
>> On 31/01/2024 14:29, David Hildenbrand wrote:
>>>>> Note that regarding NUMA effects, I mean when some memory access within
>>>>> the
>>>>> same
>>
On 31/01/2024 14:29, David Hildenbrand wrote:
>>> Note that regarding NUMA effects, I mean when some memory access within the
>>> same
>>> socket is faster/slower even with only a single node. On AMD EPYC that's
>>> possible, depending on which core you are running and on which memory
>>> control
On 31/01/2024 13:38, David Hildenbrand wrote:
Nope: looks the same. I've taken my test harness out of the picture and
done
everything manually from the ground up, with the old tests and the new.
Headline
is that I see similar numbers from both.
>>>
>>> I took me a while to
On 31/01/2024 12:56, David Hildenbrand wrote:
> On 31.01.24 13:37, Ryan Roberts wrote:
>> On 31/01/2024 11:49, Ryan Roberts wrote:
>>> On 31/01/2024 11:28, David Hildenbrand wrote:
>>>> On 31.01.24 12:16, Ryan Roberts wrote:
>>>>> On 31/01/2024 11:06, D
On 31/01/2024 11:49, Ryan Roberts wrote:
> On 31/01/2024 11:28, David Hildenbrand wrote:
>> On 31.01.24 12:16, Ryan Roberts wrote:
>>> On 31/01/2024 11:06, David Hildenbrand wrote:
>>>> On 31.01.24 11:43, Ryan Roberts wrote:
>>>>> On 29/01/2024 12:46,
On 31/01/2024 11:28, David Hildenbrand wrote:
> On 31.01.24 12:16, Ryan Roberts wrote:
>> On 31/01/2024 11:06, David Hildenbrand wrote:
>>> On 31.01.24 11:43, Ryan Roberts wrote:
>>>> On 29/01/2024 12:46, David Hildenbrand wrote:
>>>>> Now that the
On 31/01/2024 11:06, David Hildenbrand wrote:
> On 31.01.24 11:43, Ryan Roberts wrote:
>> On 29/01/2024 12:46, David Hildenbrand wrote:
>>> Now that the rmap overhaul[1] is upstream that provides a clean interface
>>> for rmap batching, let's implement PTE batc
oft-dirty, let the caller specify
> using flags
>
> [1] https://lkml.kernel.org/r/20231220224504.646757-1-da...@redhat.com
> [2] https://lkml.kernel.org/r/20231218105100.172635-1-ryan.robe...@arm.com
> [3] https://lkml.kernel.org/r/20230809083256.699513-1-da...@redhat.com
1 - 100 of 162 matches
Mail list logo