On Fri, 12 Apr 2019, Josh Poimboeuf wrote:
> v2:
> - docs improvements: [Randy, Michael]
> - Rename to "mitigations=" [Michael]
> - Add cpu_mitigations_off() function wrapper [Michael]
> - x86: Simplify logic [Boris]
> - powerpc: Fix no_rfi_flush checking bug (use '&&' instead of '||')
> - arm64:
On 4/4/19 11:32 am, Matthew Garrett wrote:
diff --git a/Documentation/ABI/testing/lockdown
b/Documentation/ABI/testing/lockdown
new file mode 100644
index ..5bd51e20917a
--- /dev/null
+++ b/Documentation/ABI/testing/lockdown
@@ -0,0 +1,19 @@
+What: security/lockdown
+Date:
On Mon, 15 Apr 2019 09:17:10 -0700
Linus Torvalds wrote:
> On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig wrote:
> >
> > Can we please have the page refcount overflow fixes out on the list
> > for review, even if it is after the fact?
>
> They were actually on a list for review long befor
Hi,
Kindly ignore this series, since patch 5/5 in this series doesn't
incorporate the event-format change
that I've done in v4 of this series.
Apologies for the inconvenience. I will post the updated v5 soon.
Thanks,
Anju
On 4/15/19 3:41 PM, Anju T Sudhakar wrote:
IMC (In-Memory collect
IMC (In-Memory collection counters) is a hardware monitoring facility
that collects large number of hardware performance events.
POWER9 support two modes for IMC which are the Accumulation mode and
Trace mode. In Accumulation mode, event counts are accumulated in syste
Add the macros needed for IMC (In-Memory Collection Counters) trace-mode
and data structure to hold the trace-imc record data.
Also, add the new type "OPAL_IMC_COUNTERS_TRACE" in 'opal-api.h', since
there is a new switch case added in the opal-calls for IMC.
Signed-off-by: Anju T Sudhakar
Reviewe
From: Madhavan Srinivasan
Add code to restrict user access to thread_imc pmu since
some event report privilege level information.
Fixes: f74c89bd80fb3 ('powerpc/perf: Add thread IMC PMU support')
Signed-off-by: Madhavan Srinivasan
Signed-off-by: Anju T Sudhakar
---
arch/powerpc/perf/imc-pmu.c
LDBAR holds the memory address allocated for each cpu. For thread-imc
the mode bit (i.e bit 1) of LDBAR is set to accumulation.
Currently, ldbar is loaded with per cpu memory address and mode set to
accumulation at boot time.
To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So
Patch detects trace-imc events, does memory initilizations for each online
cpu, and registers cpuhotplug call-backs.
Signed-off-by: Anju T Sudhakar
Reviewed-by: Madhavan Srinivasan
---
arch/powerpc/perf/imc-pmu.c | 104 ++
arch/powerpc/platforms/powernv/opal-im
Add PMU functions to support trace-imc.
Signed-off-by: Anju T Sudhakar
Reviewed-by: Madhavan Srinivasan
---
arch/powerpc/perf/imc-pmu.c | 205 +++-
1 file changed, 204 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-
* hpa:
> Using symbol versioning doesn't really help much since the real
> problem is that struct termios can be passed around in userspace, and
> the interfaces between user space libraries don't have any
> versioning. However, my POC code deals with that too by only seeing
> BOTHER when necessar
This patch series map all the kernel regions (vmalloc, IO and vmemmap) using
0xc top nibble address. This brings hash translation kernel mapping in sync
with radix.
Each of these regions can now map 512TB. We use one context to map these
regions and hence the 512TB limit. We also update radix to u
This makes it easy to update the region mapping in the later patch
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/include/asm/book3s/64/hash.h| 3 ++-
arch/powerpc/include/asm/book3s/64/pgtable.h | 8 +---
arch/powerpc/include/asm/book3s/64/radix.h | 1 +
arch/powerpc/mm/hash_utils_6
This patch maps vmap, IO and vmemap regions in the 0xc address range
instead of the current 0xd and 0xf range. This brings the mapping closer
to radix translation mode.
With hash 64K page size each of this region is 512TB whereas with 4K config
we are limited by the max page table range of 64TB an
This adds an explicit check in various functions.
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/mm/hash_utils_64.c | 18 +++---
arch/powerpc/mm/pgtable-hash64.c | 13 ++---
arch/powerpc/mm/pgtable-radix.c | 16
arch/powerpc/mm/pgtable_64.c | 5 +
All the regions are now mapped with top nibble 0xc. Hence the region id
check is not needed for virt_addr_valid()
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/include/asm/page.h | 12
1 file changed, 12 deletions(-)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/in
This reduces multiple comparisons in get_region_id to a bit shift operation.
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/include/asm/book3s/64/hash-4k.h | 4 ++-
arch/powerpc/include/asm/book3s/64/hash-64k.h | 1 +
arch/powerpc/include/asm/book3s/64/hash.h | 31 +--
a
This helps in debugging. We can look at the dmesg to find out
different kernel mapping details.
On 4K config this shows
kernel vmalloc start = 0xc0001000
kernel IO start= 0xc0002000
kernel vmemmap start = 0xc0003000
On 64K config:
kernel vmalloc start =
We now have
4K page size config
kernel_region_map_size = 16TB
kernel vmalloc start = 0xc0001000
kernel IO start= 0xc0002000
kernel vmemmap start = 0xc0003000
with 64K page size config:
kernel_region_map_size = 512TB
kernel vmalloc start = 0xc00800
On 4/16/19 3:14 PM, Anju T Sudhakar wrote:
Hi,
Kindly ignore this series, since patch 5/5 in this series doesn't
incorporate the event-format change
that I've done in v4 of this series.
Apologies for the inconvenience. I will post the updated v5 soon.
s/v5/v4
Thanks,
Anju
On 4/15/
The region actually point to linear map. Rename the #define to
clarify thati.
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/include/asm/book3s/64/hash.h | 4 ++--
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +-
arch/powerpc/mm/copro_fault.c | 4 ++--
arch/powerpc/mm/
Firmware-Assisted Dump (FADump) is currently supported only on pseries
platform. This patch series adds support for powernv platform too.
The first and third patches refactor the FADump code to make use of common
code across multiple platforms. The fifth patch adds basic FADump support
for powernv
Refactoring fadump code means internal fadump code is referenced from
different places. For ease, move internal code to a new file.
Signed-off-by: Hari Bathini
---
Changes in v2:
* Using fadump-common.* instead of fadump_internal.*
arch/powerpc/include/asm/fadump.h | 112 --
The figures depicting FADump's (Firmware-Assisted Dump) memory layout
are missing some finer details like different memory regions and what
they represent. Improve the documentation by updating those details.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 65
Introduce callbacks for platform specific operations like register,
unregister, invalidate & such, and move pseries specific code into
platform code.
Signed-off-by: Hari Bathini
---
Changes in v2:
* pSeries specific fadump code files are named rtas-fadump.*
instead of pseries_fadump.*
arch/
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 56 +++---
1 file changed, 28 insertions(+), 28 deletions(-)
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt
b/Documentation/powerpc/firmware-assisted-dump.txt
index 059993b..62e75
From: Hari Bathini
Firmware-assisted dump support is enabled for OPAL based POWER platforms
in P9 firmware. Make the corresponding updates in kernel to enable fadump
support for such platforms.
Signed-off-by: Hari Bathini
---
Changes in v2:
* Updated API number for FADump according to recent O
With FADump support now available on both pseries and OPAL platforms,
update FADump documentation with these details.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 90 --
1 file changed, 51 insertions(+), 39 deletions(-)
diff --git a/Do
Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse reserved-ranges DT
node and reserve kernel memory falling in these ranges for F/W
purposes. Ensure memory in these ranges is not overlapped with
memory reserved for FADump.
Also, use a
Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse 'reserved-ranges' DT
node to reserve kernel memory falling in these ranges for firmware
purposes. Along with the preserved area memory, also ensure memory
in reserved ranges is not overl
From: Hari Bathini
Firmware provides architected register state data at the time of crash.
Process this data and build CPU notes to append to ELF core.
Signed-off-by: Hari Bathini
Signed-off-by: Vasant Hegde
---
Changes in v2:
* Updated reg type values according to recent OPAL changes
arch
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADump is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this co
Kernel config option CONFIG_PRESERVE_FA_DUMP is introduced to ensure
crash data, from previously crash'ed kernel, is preserved. Update
documentation with this details.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt |9 +
1 file changed, 9 insertions(
From: Hari Bathini
Export /proc/opalcore file to analyze opal crashes. Since opalcore can
be generated independent of CONFIG_FA_DUMP support in kernel, add this
support under a new kernel config option CONFIG_OPAL_CORE. Also, avoid
code duplication by moving common code used for processing the re
If OPAL crashes when the kernel is not registered for FADump, F/W still
exports OPAL core through result-table DT node. Make sure '/proc/vmcore'
processing is skipped as only data relevant to OPAL core is exported in
such scenario.
Signed-off-by: Hari Bathini
---
arch/powerpc/platforms/powernv/o
Writing '1' to /sys/kernel/fadump_release_opalcore would release the
memory held by kernel in exporting /proc/opalcore file.
Signed-off-by: Hari Bathini
---
arch/powerpc/platforms/powernv/opal-core.c | 39
1 file changed, 39 insertions(+)
diff --git a/arch/powerpc
OPAL loads kernel & initrd at 512MB offset (256MB size), also exported
as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is
less than 768MB, kernel memory to be exported as '/proc/vmcore' would
be overwritten by f/w while loading kernel & initrd. To avoid such a
scenario, enforce a m
With /proc/opalcore support available on OPAL based machines and an
option to release memory used by kernel in exporting /proc/opalcore,
update FADump documentation with these details.
Signed-off-by: Hari Bathini
---
Documentation/powerpc/firmware-assisted-dump.txt | 19 +++
1
On Mon, Apr 15, 2019 at 04:22:57PM +0200, Arnd Bergmann wrote:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the gener
On Tue, 16 Apr 2019 11:09:06 +0200
Martin Schwidefsky wrote:
> On Mon, 15 Apr 2019 09:17:10 -0700
> Linus Torvalds wrote:
>
> > On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig
> > wrote:
> > >
> > > Can we please have the page refcount overflow fixes out on the list
> > > for review, eve
On 16/04/2019 06:59, Florian Weimer wrote:
> * hpa:
>
>> Using symbol versioning doesn't really help much since the real
>> problem is that struct termios can be passed around in userspace, and
>> the interfaces between user space libraries don't have any
>> versioning. However, my POC code dea
Hi Catalin,
On 15/04/2019 18:35, Catalin Marinas wrote:
> On Mon, Apr 01, 2019 at 12:51:48PM +0100, Vincenzo Frascino wrote:
>> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
>> index 2d419006ad43..47ba72345739 100644
>> --- a/arch/arm64/kernel/vdso.c
>> +++ b/arch/arm64/kernel/v
Aneesh Kumar K.V's on April 16, 2019 8:07 pm:
> We now have
>
> 4K page size config
>
> kernel_region_map_size = 16TB
> kernel vmalloc start = 0xc0001000
> kernel IO start= 0xc0002000
> kernel vmemmap start = 0xc0003000
>
> with 64K page size config:
>
>
On Fri, Apr 12, 2019 at 03:39:28PM -0500, Josh Poimboeuf wrote:
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 38890f62f9a8..aed9083f8eac 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2320,3 +2320,18 @@ void __init boot_cpu_hotplug_init(void)
> #endif
> this_cpu_write(cpuhp_stat
On Tue, Apr 16, 2019 at 01:42:58PM +0100, Vincenzo Frascino wrote:
> On 15/04/2019 18:35, Catalin Marinas wrote:
> > On Mon, Apr 01, 2019 at 12:51:48PM +0100, Vincenzo Frascino wrote:
> >> +1:/* Get hrtimer_res */
> >> + seqcnt_acquire
> >> + syscall_check fail=5f
> >> + ldr x2, [vds
On Tue, Apr 16, 2019 at 03:44:55PM +0200, Laurent Dufour wrote:
> From: Mahendran Ganesh
>
> Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for arm64. This
> enables Speculative Page Fault handler.
>
> Signed-off-by: Ganesh Mahendran
This is missing your S-o-B.
The first patch noted that the ARCH_S
On Tue, Apr 16, 2019 at 04:31:27PM +0200, Laurent Dufour wrote:
> Le 16/04/2019 à 16:27, Mark Rutland a écrit :
> > On Tue, Apr 16, 2019 at 03:44:55PM +0200, Laurent Dufour wrote:
> > > From: Mahendran Ganesh
> > >
> > > Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for arm64. This
> > > enables Specu
On Fri, Apr 12, 2019 at 02:11:31PM +0100, Robin Murphy wrote:
> On 12/04/2019 11:26, John Garry wrote:
> > On 09/04/2019 13:53, Zhen Lei wrote:
> > > +static int __init iommu_dma_mode_setup(char *str)
> > > +{
> > > + if (!str)
> > > + goto fail;
> > > +
> > > + if (!strncmp(str, "pass
On Tue, Apr 16, 2019 at 04:13:35PM +0200, Borislav Petkov wrote:
> On Fri, Apr 12, 2019 at 03:39:28PM -0500, Josh Poimboeuf wrote:
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 38890f62f9a8..aed9083f8eac 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2320,3 +2320,18 @@ void _
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or
The current version of the multiarch vDSO selftest verifies only
gettimeofday.
Extend the vDSO selftest to clock_getres, to verify that the
syscall and the vDSO library function return the same information.
The extension has been used to verify the hrtimer_resoltion fix.
Cc: Shuah Khan
Signed-o
On Tue, Apr 16, 2019 at 5:08 AM Martin Schwidefsky
wrote:
>
> This is not nice, would a patch like the following be acceptable?
Umm.
We actually already *have* this function.
It's called "gup_fast_permitted()" and it's used by x86-64 to verify
the proper address range. Exactly like s390 needs..
On Tue, Apr 16, 2019 at 05:14:30PM +0100, Vincenzo Frascino wrote:
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index 2d419006ad43..5f5759d51c33 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -245,6 +245,8 @@ void update_vsyscall(struct timekee
On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
wrote:
>
> We actually already *have* this function.
>
> It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> the proper address range. Exactly like s390 needs..
>
> Could you please use that instead?
IOW, something like the attache
On Tue, Apr 16, 2019 at 05:14:34PM +0100, Vincenzo Frascino wrote:
> The current version of the multiarch vDSO selftest verifies only
> gettimeofday.
>
> Extend the vDSO selftest to clock_getres, to verify that the
> syscall and the vDSO library function return the same information.
>
> The exten
On Tue, Apr 16, 2019 at 05:24:33PM +0100, Catalin Marinas wrote:
> On Tue, Apr 16, 2019 at 05:14:30PM +0100, Vincenzo Frascino wrote:
> > diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> > index 2d419006ad43..5f5759d51c33 100644
> > --- a/arch/arm64/kernel/vdso.c
> > +++ b/arch/ar
On Fri, 12 Apr 2019, Josh Poimboeuf wrote:
> Configure arm64 runtime CPU speculation bug mitigations in accordance
> with the 'mitigations=' cmdline option. This affects Meltdown, Spectre
> v2, and Speculative Store Bypass.
>
> The default behavior is unchanged.
>
> Signed-off-by: Josh Poimboeu
On Fri, 12 Apr 2019 20:09:16 -0700
Guenter Roeck wrote:
> The big real-world question is: Is the series good enough for you to accept,
> or do you expect some level of user/kernel separation ?
I guess it can go in; it's forward progress, even if it doesn't make the
improvements I would like to s
On Tue, Apr 16, 2019 at 09:26:13PM +0200, Thomas Gleixner wrote:
> On Fri, 12 Apr 2019, Josh Poimboeuf wrote:
>
> > Configure arm64 runtime CPU speculation bug mitigations in accordance
> > with the 'mitigations=' cmdline option. This affects Meltdown, Spectre
> > v2, and Speculative Store Bypass
Hi Al,
It took me way longer than I had hoped to revisit this series, see
https://lore.kernel.org/lkml/20180912150142.157913-1-a...@arndb.de/
for the previously posted version.
I've come to the point where all conversion handlers and most
COMPATIBLE_IOCTL() entries are gone from this file, but fo
A handful of drivers all have a trivial wrapper around their ioctl
handler, but don't call the compat_ptr() conversion function at the
moment. In practice this does not matter, since none of them are used
on the s390 architecture and for all other architectures, compat_ptr()
does not do anything, b
On Tue, Apr 16, 2019 at 02:19:49PM -0600, Jonathan Corbet wrote:
> On Fri, 12 Apr 2019 20:09:16 -0700
> Guenter Roeck wrote:
>
> > The big real-world question is: Is the series good enough for you to accept,
> > or do you expect some level of user/kernel separation ?
>
> I guess it can go in; it
This configuration variable will be used to build the code needed to
handle speculative page fault.
By default it is turned off, and activated depending on architecture
support, ARCH_HAS_PTE_SPECIAL, SMP and MMU.
The architecture support is needed since the speculative page fault handler
is calle
This allows to search for a VMA structure without holding the mmap_sem.
The search is repeated while the mm seqlock is changing and until we found
a valid VMA.
While under the RCU protection, a reference is taken on the VMA, so the
caller must call put_vma() once it not more need the VMA structur
When handling speculative page fault, the vma->vm_flags and
vma->vm_page_prot fields are read once the page table lock is released. So
there is no more guarantee that these fields would not change in our back.
They will be saved in the vm_fault structure before the VMA is checked for
changes.
In t
Add speculative_pgfault vmstat counter to count successful speculative page
fault handling.
Also fixing a minor typo in include/linux/vm_event_item.h.
Signed-off-by: Laurent Dufour
---
include/linux/vm_event_item.h | 3 +++
mm/memory.c | 3 +++
mm/vmstat.c |
When dealing with the speculative fault path we should use the VMA's field
cached value stored in the vm_fault structure.
Currently vm_normal_page() is using the pointer to the VMA to fetch the
vm_flags value. This patch provides a new __vm_normal_page() which is
receiving the vm_flags flags value
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for BOOK3S_64. This enables
the Speculative Page Fault handler.
Support is only provide for BOOK3S_64 currently because:
- require CONFIG_PPC_STD_MMU because checks done in
set_access_flags_filter()
- require BOOK3S because we can't support for book3e_hug
migrate_misplaced_page() is only called during the page fault handling so
it's better to pass the pointer to the struct vm_fault instead of the vma.
This way during the speculative page fault path the saved vma->vm_flags
could be used.
Acked-by: David Rientjes
Signed-off-by: Laurent Dufour
---
The speculative page fault handler must be protected against anon_vma
changes. This is because page_add_new_anon_rmap() is called during the
speculative path.
In addition, don't try speculative page fault if the VMA don't have an
anon_vma structure allocated because its allocation should be
protec
From: Peter Zijlstra
Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
counts such that we can easily test if a VMA is changed.
The calls to vm_write_begin/end() in unmap_page_range() are
used to detect when a VMA is being unmap and thus that new page fault
should not be sat
When handling page fault without holding the mmap_sem the fetch of the
pte lock pointer and the locking will have to be done while ensuring
that the VMA is not touched in our back.
So move the fetch and locking operations in a dedicated function.
Signed-off-by: Laurent Dufour
---
mm/memory.c |
This patch enable the speculative page fault on the PowerPC
architecture.
This will try a speculative page fault without holding the mmap_sem,
if it returns with VM_FAULT_RETRY, the mmap_sem is acquired and the
traditional page fault processing is done.
The speculative path is only tried for mult
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT which turns on the
Speculative Page Fault handler when building for 64bit.
Cc: Thomas Gleixner
Signed-off-by: Laurent Dufour
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0f2ab09da060..
This allows to turn on/off the use of the speculative page fault handler.
By default it's turned on.
Signed-off-by: Laurent Dufour
---
include/linux/mm.h | 3 +++
kernel/sysctl.c| 9 +
mm/memory.c| 3 +++
3 files changed, 15 insertions(+)
diff --git a/include/linux/mm.h b/i
pte_unmap_same() is making the assumption that the page table are still
around because the mmap_sem is held.
This is no more the case when running a speculative page fault and
additional check must be made to ensure that the final page table are still
there.
This is now done by calling pte_spinloc
From: Mahendran Ganesh
This patch enables the speculative page fault on the arm64
architecture.
I completed spf porting in 4.9. From the test result,
we can see app launching time improved by about 10% in average.
For the apps which have more than 50 threads, 15% or even more
improvement can be
From: Peter Zijlstra
When speculating faults (without holding mmap_sem) we need to validate
that the vma against which we loaded pages is still valid when we're
ready to install the new PTE.
Therefore, replace the pte_offset_map_lock() calls that (re)take the
PTL with pte_map_lock() which can fa
Add support for the new speculative faults event.
Acked-by: David Rientjes
Signed-off-by: Laurent Dufour
---
tools/include/uapi/linux/perf_event.h | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/parse-events.c| 4
tools/perf/util/parse-events.l| 1 +
too
Introducing a per mm_struct seqlock, mm_seq field, to protect the changes
made in the MM RB tree. This allows to walk the RB tree without grabbing
the mmap_sem, and on the walk is done to double check that sequence counter
was stable during the walk.
The mm seqlock is held while inserting and remo
The speculative page fault handler which is run without holding the
mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags
is not guaranteed to remain constant.
Introducing __lru_cache_add_active_or_unevictable() which has the vma flags
value parameter instead of the vma pointer
If a thread is remapping an area while another one is faulting on the
destination area, the SPF handler may fetch the vma from the RB tree before
the pte has been moved by the other thread. This means that the moved ptes
will overwrite those create by the page fault handler leading to page
leaked.
From: Peter Zijlstra
Try a speculative fault before acquiring mmap_sem, if it returns with
VM_FAULT_RETRY continue with the mmap_sem acquisition and do the
traditional fault.
Signed-off-by: Peter Zijlstra (Intel)
[Clearing of FAULT_FLAG_ALLOW_RETRY is now done in
handle_speculative_fault()]
[
From: Mahendran Ganesh
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for arm64. This
enables Speculative Page Fault handler.
Signed-off-by: Ganesh Mahendran
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 870ef86a64ed..8e86934
The VMA sequence count has been introduced to allow fast detection of
VMA modification when running a page fault handler without holding
the mmap_sem.
This patch provides protection against the VMA modification done in :
- madvise()
- mpol_rebind_policy()
- vma_replace_poli
The final goal is to be able to use a VMA structure without holding the
mmap_sem and to be sure that the structure will not be freed in our back.
The lockless use of the VMA will be done through RCU protection and thus a
dedicated freeing service is required to manage it asynchronously.
As report
Add a new software event to count succeeded speculative page faults.
Acked-by: David Rientjes
Signed-off-by: Laurent Dufour
---
include/uapi/linux/perf_event.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b
Vinayak Menon faced a panic because one thread was page faulting a page in
swap, while another one was mprotecting a part of the VMA leading to a VMA
split.
This raise a panic in swap_vma_readahead() because the VMA's boundaries
were not more matching the faulting address.
To avoid this, if the pa
Vinayak Menon and Ganesh Mahendran reported that the following scenario may
lead to thread being blocked due to data corruption:
CPU 1 CPU 2CPU 3
Process 1, Process 1, Process 1,
Thread AThread B
From: Peter Zijlstra
Provide infrastructure to do a speculative fault (not holding
mmap_sem).
The not holding of mmap_sem means we can race against VMA
change/removal and page-table destruction. We use the SRCU VMA freeing
to keep the VMA around. We use the VMA seqcount to detect change
(includi
Le 16/04/2019 à 16:27, Mark Rutland a écrit :
On Tue, Apr 16, 2019 at 03:44:55PM +0200, Laurent Dufour wrote:
From: Mahendran Ganesh
Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for arm64. This
enables Speculative Page Fault handler.
Signed-off-by: Ganesh Mahendran
This is missing your S-o-B.
When dealing with speculative page fault handler, we may race with VMA
being split or merged. In this case the vma->vm_start and vm->vm_end
fields may not match the address the page fault is occurring.
This can only happens when the VMA is split but in that case, the
anon_vma pointer of the new VM
This patch a set of new trace events to collect the speculative page fault
event failures.
Signed-off-by: Laurent Dufour
---
include/trace/events/pagefault.h | 80
mm/memory.c | 57 ++-
2 files changed, 125 insertions(+),
This is a port on kernel 5.1 of the work done by Peter Zijlstra to handle
page fault without holding the mm semaphore [1].
The idea is to try to handle user space page faults without holding the
mmap_sem. This should allow better concurrency for massively threaded
process since the page fault hand
Some VMA struct fields need to be initialized once the VMA structure is
allocated.
Currently this only concerns anon_vma_chain field but some other will be
added to support the speculative page fault.
Instead of spreading the initialization calls all over the code, let's
introduce a dedicated inli
This patch corrects the SPDX License Identifier style
in the powerpc Hardware Architecture related files.
Suggested-by: Joe Perches
Signed-off-by: Nishad Kamdar
---
arch/powerpc/include/asm/pnv-ocxl.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/p
1 - 100 of 110 matches
Mail list logo