With appropriate compiler support [1], KASAN builds use __asan prefixed
meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.
If compiler support is detected (CC_HAS_KASAN_MEMINTRINSIC_PREFIX),
define memintrinsics normally (do not prefix '__').
On powerpc, KASAN is the only user o
On 26.02.23 21:13, Geert Uytterhoeven wrote:
Hi David,
Hi Geert,
On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand wrote:
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
from the type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen b
On Sun, Feb 26, 2023 at 02:12:38PM -0800, Andrew Morton wrote:
> On Fri, 3 Feb 2023 17:18:37 +1000 Nicholas Piggin wrote:
>
> > On a 16-socket 192-core POWER8 system, the context_switch1_threads
> > benchmark from will-it-scale (see earlier changelog), upstream can
> > achieve a rate of about 1
On Thu, Feb 23, 2023 at 10:24:19PM +0100, Andrzej Hajda wrote:
> On 22.02.2023 18:04, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:
> >
> > > Andrzej Hajda (7):
> > >arch: rename all internal names __xchg to __arch_xchg
> > >linux/include: add non
Hi David,
On Mon, Feb 27, 2023 at 2:31 PM David Hildenbrand wrote:
> On 26.02.23 21:13, Geert Uytterhoeven wrote:
> > On Fri, Jan 13, 2023 at 6:16 PM David Hildenbrand wrote:
> >> Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit
> >> from the type. Generic MM currently only uses 5
Unlike PVR_POWER8, etc , PVR_7450 represents a full PVR
value and not a family value.
To avoid confusion, do like E500 family and define the relevant
PVR_VER_ values for the 7450 family:
0x8000 ==> 7450
0x8001 ==> 7455
0x8002 ==> 7447
0x8003 ==> 7447A
0x8004 ==> 7448
And use the
/*
* Externally used page protection values.
diff --git a/arch/microblaze/include/asm/pgtable.h
b/arch/microblaze/include/asm/pgtable.h
index 42f5988e998b..7e3de54bf426 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -131,10 +131,10 @@ exte
On Fri, 24 Feb 2023, freak07 wrote:
Here are some measurements from a Pixel 7 Pro that´s running a kernel either
with the Per-VMA locks patchset or without.
If there´s interest I can provide results of other specific apps as well.
Results are from consecutive cold app launches issued with "am
On Mon, Feb 27, 2023 at 9:19 AM Davidlohr Bueso wrote:
>
> On Fri, 24 Feb 2023, freak07 wrote:
>
> >Here are some measurements from a Pixel 7 Pro that´s running a kernel either
> >with the Per-VMA locks patchset or without.
> >If there´s interest I can provide results of other specific apps as we
On Fri, Feb 24, 2023 at 8:19 AM Suren Baghdasaryan wrote:
>
> On Fri, Feb 24, 2023 at 8:14 AM Liam R. Howlett
> wrote:
> >
> > * Suren Baghdasaryan [230223 21:06]:
> > > On Thu, Feb 23, 2023 at 5:46 PM Liam R. Howlett
> > > wrote:
> > > >
> > > > * Suren Baghdasaryan [230223 16:16]:
> > > >
Previous versions:
v3: https://lore.kernel.org/all/20230216051750.3125598-1-sur...@google.com/
v2: https://lore.kernel.org/lkml/20230127194110.533103-1-sur...@google.com/
v1: https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/
RFC: https://lore.kernel.org/all/20220901173516.7021
From: Liam Howlett
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett
Signed
From: Liam Howlett
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-o
From: Liam Howlett
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types. This is necessary for the
RCU mode of the maple tree.
Fixes: 54a611b60590 ("Maple Tree:
From: Liam Howlett
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU
mode of the maple tree.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett
Signed-off-by: Suren Baghdas
From: "Liam R. Howlett"
During the development of the maple tree, the strategy of freeing
multiple nodes changed and, in the process, the pivots were reused to
store pointers to dead nodes. To ensure the readers see accurate
pivots, the writers need to mark the nodes as dead and call smp_wmb() t
From: "Liam R. Howlett"
Add an smp_rmb() before reading the parent pointer to ensure that
anything read from the node prior to the parent pointer hasn't been
reordered ahead of this check.
The is necessary for RCU mode.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: L
From: "Liam R. Howlett"
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during
destruction of
From: "Liam R. Howlett"
Use the maple tree in RCU mode for VMA tracking. This is necessary for
the use of per-VMA locking. RCU mode is enabled by default but disabled
when exiting an mm and for the new tree during a fork.
Also enable RCU for the tree used in munmap operations to ensure the
nod
This configuration variable will be used to build the support for VMA
locking during page fault handling.
This is enabled on supported architectures with SMP and MMU set.
The architecture support is needed since the page fault handler is called
from the architecture's page faulting code which nee
From: Michel Lespinasse
This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.
Signed-off-by: Michel Lespinasse
Signed-off-by: Suren Baghdasaryan
---
include/linux/mm_types.h | 13 ++---
kernel/
Move mmap_lock assert function definitions up so that they can be used
by other mmap_lock routines.
Signed-off-by: Suren Baghdasaryan
---
include/linux/mmap_lock.h | 24
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/l
Introduce per-VMA locking. The lock implementation relies on a
per-vma and per-mm sequence counters to note exclusive locking:
- read lock - (implemented by vma_start_read) requires the vma
(vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ.
If they match then there must be a
Updates to vm_flags have to be done with VMA marked as being written for
preventing concurrent page faults or other modifications.
Signed-off-by: Suren Baghdasaryan
---
include/linux/mm.h | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/
vma_prepare() acquires all locks required before VMA modifications.
Move vma_prepare() before vma_adjust_trans_huge() so that VMA is locked
before any modification.
Signed-off-by: Suren Baghdasaryan
---
mm/mmap.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/m
Protect VMA from concurrent page fault handler while collapsing a huge
page. Page fault handler needs a stable PMD to use PTL and relies on
per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(),
set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will
not be detected
Write-lock all VMAs which might be affected by a merge, split, expand
or shrink operations. All these operations use vma_prepare() before
making the modifications, therefore it provides a centralized place to
perform VMA locking.
Signed-off-by: Suren Baghdasaryan
---
mm/mmap.c | 10 ++
1
Write-lock VMA as locked before copying it and when copy_vma produces
a new VMA.
Signed-off-by: Suren Baghdasaryan
Reviewed-by: Laurent Dufour
---
mm/mmap.c | 1 +
mm/mremap.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index e73fbb84ce12..1f42b9a52b9b 100644
-
Write-locking VMAs before isolating them ensures that page fault
handlers don't operate on isolated VMAs.
Signed-off-by: Suren Baghdasaryan
---
mm/mmap.c | 1 +
mm/nommu.c | 5 +
2 files changed, 6 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 1f42b9a52b9b..f7ed357056c4 100644
---
Normally free_pgtables needs to lock affected VMAs except for the case
when VMAs were isolated under VMA write-lock. munmap() does just that,
isolating while holding appropriate locks and then downgrading mmap_lock
and dropping per-VMA locks before freeing page tables.
Add a parameter to free_pgtab
Assert there are no holders of VMA lock for reading when it is about to be
destroyed.
Signed-off-by: Suren Baghdasaryan
---
kernel/fork.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index e1dd79c7738c..bdb55f25895d 100644
--- a/kernel/fork.c
+++ b/kernel/
Page fault handlers might need to fire MMU notifications while a new
notifier is being registered. Modify mm_take_all_locks to write-lock all
VMAs and prevent this race with page fault handlers that would hold VMA
locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
locking orde
Per-vma locking mechanism will search for VMA under RCU protection and
then after locking it, has to ensure it was not removed from the VMA
tree after we found it. To make this check efficient, introduce a
vma->detached flag to mark VMAs which were removed from the VMA tree.
Signed-off-by: Suren B
Introduce lock_vma_under_rcu function to lookup and lock a VMA during
page fault handling. When VMA is not found, can't be locked or changes
after being locked, the function returns NULL. The lookup is performed
under RCU protection to prevent the found VMA from being destroyed before
the VMA lock
Add a new flag to distinguish page faults handled under protection of
per-vma lock.
Signed-off-by: Suren Baghdasaryan
Reviewed-by: Laurent Dufour
---
include/linux/mm.h | 3 ++-
include/linux/mm_types.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.
When vma->anon_vma is not set, page fault handler will set it by either
reusing anon_vma of an adjacent VMA if VMAs are compatible or by
allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
a compatible adjacent VMA and that requires not only the faulting VMA
to be stable but also
Due to the possibility of do_swap_page dropping mmap_lock, abort fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.
Signed-off-by: Suren Baghdasaryan
Reviewed-by: Laurent Dufour
---
mm/memory.c | 5 +
1 file changed, 5 insertions(+)
Due to the possibility of handle_userfault dropping mmap_lock, avoid fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.
Signed-off-by: Suren Baghdasaryan
Suggested-by: Peter Xu
---
mm/memory.c | 9 +
1 file changed, 9 insertions
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra
statistics about handling page fault under VMA lock.
Signed-off-by: Suren Baghdasaryan
---
include/linux/vm_event_item.h | 6 ++
include/linux/vmstat.h| 6 ++
mm/Kconfig.debug | 6 ++
mm/memory.c
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Signed-off-by: Suren Baghdasaryan
---
arch/x86/Kconfig| 1 +
arch/x86/mm/fault.c | 36
2 files changed, 37 insertions(+)
diff --git a
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Signed-off-by: Suren Baghdasaryan
---
arch/arm64/Kconfig| 1 +
arch/arm64/mm/fault.c | 36
2 files changed, 37 insertions(+)
diff --g
From: Laurent Dufour
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Copied from "x86/mm: try VMA lock-based page fault handling first"
Signed-off-by: Laurent Dufour
Signed-off-by: Suren Baghdasaryan
---
arch/powerpc/mm/f
call_rcu() can take a long time when callback offloading is enabled.
Its use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed.
Because exit_mmap() is called only after the last mm user drops its
refcount, the page fault handlers can't be racing with it.
vma->lock being part of the vm_area_struct causes performance regression
during page faults because during contention its count and owner fields
are constantly updated and having other parts of vm_area_struct used
during page fault handling next to them causes constant cache line
bouncing. Fix that
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
per-folio.
I'm unsure about my merging of flush_dcache_icache_hugepage() and
flush_dcache_icache_page() into flush_dcache_icache_folio() and subsequent
removal
Hi,
Le 27/02/2023 à 18:57, Matthew Wilcox (Oracle) a écrit :
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> per-folio.
>
> I'm unsure about my merging of flush_dcache_icache_hugepage() and
> flush_dca
Hi David,
On Mon, Feb 27, 2023 at 6:01 PM David Hildenbrand wrote:
> /*
> * Externally used page protection values.
> diff --git a/arch/microblaze/include/asm/pgtable.h
> b/arch/microblaze/include/asm/pgtable.h
> index 42f5988e998b..7e3de54bf426 100644
> ---
On Sun, Feb 12, 2023, at 09:46, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)"
>
> asm/agp.h is duplicated in several architectures, with x86 being the
> only instance that differs from the rest.
>
> Introduce asm-generic/agp.h and use it instead of per-architecture
> headers for the most case
On Mon, Feb 27, 2023 at 07:45:08PM +, Christophe Leroy wrote:
> Hi,
>
> Le 27/02/2023 à 18:57, Matthew Wilcox (Oracle) a écrit :
> > Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> > Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> > per-folio.
> >
On Sat, Feb 25, 2023 at 4:51 PM Arnd Bergmann wrote:
>
> On Sat, Feb 25, 2023, at 17:50, Paul Gortmaker wrote:
> > [RE: [RFC PATCH 0/4] Remove some e300/MPC83xx evaluation platforms] On
> > 24/02/2023 (Fri 21:16) Leo Li wrote:
> >
> > Thanks for confirming with your marketing team that they "do no
On Sat, Feb 25, 2023 at 10:52 AM Paul Gortmaker
wrote:
>
> [RE: [RFC PATCH 0/4] Remove some e300/MPC83xx evaluation platforms] On
> 24/02/2023 (Fri 21:16) Leo Li wrote:
>
> >
> >
> > > -Original Message-
> > > From: Paul Gortmaker
> > > Sent: Monday, February 20, 2023 5:59 AM
> > > To: l
On Tue, Feb 21, 2023 at 1:52 PM Paul Gortmaker
wrote:
>
> [This RFC is proposed for v6.4 and hence is based off linux-next.]
>
> In a similar theme to the e300/MPC83xx evaluation platform removal[1],
> this targets removal of some 13 --> 21 year old e500/MPC85xx evaluation
> boards that were produ
On Mon, 27 Feb 2023 10:47:27 +0100 Marco Elver wrote:
> With appropriate compiler support [1], KASAN builds use __asan prefixed
> meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.
>
> If compiler support is detected (CC_HAS_KASAN_MEMINTRINSIC_PREFIX),
> define memintrinsics no
The x86 Control-flow Enforcement Technology (CET) feature includes a new
type of memory called shadow stack. This shadow stack memory has some
unusual properties, which requires some core mm changes to function
properly.
One of these unusual properties is that shadow stack memory is writable,
but
On 2/23/23 22:25, Michael Ellerman wrote:
There's code in prom_instantiate_sml() to do a "SML handover" (Stored
Measurement Log) from OF to Linux, before Linux shuts down Open
Firmware.
This involves creating a buffer to hold the SML, and creating two device
tree properties to record its base
On Mon, 27 Feb 2023 at 23:16, Andrew Morton wrote:
>
> On Mon, 27 Feb 2023 10:47:27 +0100 Marco Elver wrote:
>
> > With appropriate compiler support [1], KASAN builds use __asan prefixed
> > meminstrinsics, and KASAN no longer overrides memcpy/memset/memmove.
> >
> > If compiler support is detect
dd logs info to stderr by default. This info is pointless in the
selftests and makes legitimate issues harder to spot.
Pass the option to silence the info logs. Actual errors would still be
printed.
Signed-off-by: Benjamin Gray
---
tools/testing/selftests/powerpc/mm/Makefile | 2 +-
1 file chan
There are several messages being logged to stderr when building the PowerPC
selftests:
$ make -j$(nproc) O=build -C tools/testing/selftests \
INSTALL_PATH="$PWD"/out/selftests TARGETS=powerpc install > /dev/null
Makefile:50: warning: overriding recipe for target 'clean'
../../lib.mk:124
The CLEAN macro was added in 337f1e36 to prevent the
Makefile:50: warning: overriding recipe for target 'clean'
../../lib.mk:124: warning: ignoring old recipe for target 'clean'
style warnings. Expand it's use to fix another case of redefining a
target directly.
Signed-off-by: Benjamin G
Make supports passing the 'jobserver' (parallel make support) to child
invocations of make when either
1. The target command uses $(MAKE) directly
2. The command starts with '+'
This context is not passed through expansions that result in $(MAKE), so
the macros used in several plac
On Mon, Feb 27, 2023 at 06:08:31PM -0500, Stefan Berger wrote:
>
>
> On 2/23/23 22:25, Michael Ellerman wrote:
> > There's code in prom_instantiate_sml() to do a "SML handover" (Stored
> > Measurement Log) from OF to Linux, before Linux shuts down Open
> > Firmware.
> >
> > This involves creatin
Walks the stack when copy_{to,from}_user address is in the stack to
ensure that the object being copied is entirely a single stack frame and
does not contain stack metadata.
Substantially similar to the x86 implementation. The back chain is used
to traverse the stack and identify stack frame bound
Le 27/02/2023 à 21:20, Matthew Wilcox a écrit :
> On Mon, Feb 27, 2023 at 07:45:08PM +, Christophe Leroy wrote:
>> Hi,
>>
>> Le 27/02/2023 à 18:57, Matthew Wilcox (Oracle) a écrit :
>>> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
>>> Change the PG_arch_1 (aka PG_dcache_
63 matches
Mail list logo