date:20181108

Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-08 Thread Aleksa Sarai

On 2018-11-08, Aleksa Sarai  wrote:
> I will attach what I have at the moment to hopefully explain what the
> issue I've found is (re-using the kretprobe architecture but with the
> shadow-stack idea).

Here is the patch I have at the moment (it works, except for the
question I have about how to handle the top-level pt_regs -- I've marked
that code with XXX).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH


--8<-

Since the return address is modified by kretprobe, the various unwinders
can produce invalid and confusing stack traces. ftrace mostly solved
this problem by teaching each unwinder how to find the original return
address for stack trace purposes. This same technique can be applied to
kretprobes by simply adding a pointer to where the return address was
replaced in the stack, and then looking up the relevant
kretprobe_instance when a stack trace is requested.

[WIP: This is currently broken because the *first entry* will not be
  overwritten since it looks like the stack pointer is different
  when we are provided pt_regs. All other addresses are correctly
  handled.]

Signed-off-by: Aleksa Sarai 
---
 arch/x86/events/core.c |  6 +++-
 arch/x86/include/asm/ptrace.h  |  1 +
 arch/x86/kernel/kprobes/core.c |  5 ++--
 arch/x86/kernel/stacktrace.c   | 10 +--
 arch/x86/kernel/unwind_frame.c |  2 ++
 arch/x86/kernel/unwind_guess.c |  5 +++-
 arch/x86/kernel/unwind_orc.c   |  2 ++
 include/linux/kprobes.h| 15 +-
 kernel/kprobes.c   | 55 ++
 9 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index de32741d041a..d71062702179 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2371,7 +2371,11 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
return;
}
 
-   if (perf_callchain_store(entry, regs->ip))
+   /* XXX: Currently broken -- stack_addr(regs) doesn't match entry. */
+   addr = regs->ip;
+   //addr = ftrace_graph_ret_addr(current, &state.graph_idx, addr, 
stack_addr(regs));
+   addr = kretprobe_ret_addr(current, addr, stack_addr(regs));
+   if (perf_callchain_store(entry, addr))
return;
 
for (unwind_start(&state, current, regs, NULL); !unwind_done(&state);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index ee696efec99f..c4dfafd43e11 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -172,6 +172,7 @@ static inline unsigned long kernel_stack_pointer(struct 
pt_regs *regs)
return regs->sp;
 }
 #endif
+#define stack_addr(regs) ((unsigned long *) kernel_stack_pointer(regs))
 
 #define GET_IP(regs) ((regs)->ip)
 #define GET_FP(regs) ((regs)->bp)
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index b0d1e81c96bb..eb4da885020c 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -69,8 +69,6 @@
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
-#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs))
-
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
  (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) |   \
@@ -568,7 +566,8 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, 
struct pt_regs *regs)
 {
unsigned long *sara = stack_addr(regs);
 
-   ri->ret_addr = (kprobe_opcode_t *) *sara;
+   ri->ret_addrp = (kprobe_opcode_t **) sara;
+   ri->ret_addr = *ri->ret_addrp;
 
/* Replace the return addr with trampoline addr */
*sara = (unsigned long) &kretprobe_trampoline;
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 7627455047c2..8a4fb3109d6b 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -37,8 +38,13 @@ static void noinline __save_stack_trace(struct stack_trace 
*trace,
struct unwind_state state;
unsigned long addr;
 
-   if (regs)
-   save_stack_address(trace, regs->ip, nosched);
+   if (regs) {
+   /* XXX: Currently broken -- stack_addr(regs) doesn't match 
entry. */
+   addr = regs->ip;
+   //addr = ftrace_graph_ret_addr(current, &state.graph_idx, addr, 
stack_addr(regs));
+   addr = kretprobe_ret_addr(current, addr, stack_addr(regs));
+   save_stack_address(trace, addr, nosched);
+   }
 
for (unwind_start(&state, task, regs, NULL); !unwind_done(&state);
 unwind_next_frame(&state))

Re: [PATCH v14 00/12] Enable cpuset controller in default hierarchy

2018-11-08 Thread Peter Zijlstra

On Wed, Nov 07, 2018 at 01:32:29PM -0800, Tejun Heo wrote:

> How about the following?
> 
> Interface file is named "cpuset.cpus.partition" as it's a partition of
> cpus.  The writeable keys are "root" and "member" - a partition root
> and a partition member.  When "root" is requested but can't be
> satisfied it shows "root invalid".  This seems pretty consistent with
> cgroup.type and other cpuset files.

I can live with that, thanks!

RE: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread David Laight

From: Martin Steigerwald
> Sent: 07 November 2018 17:05
...
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> bits. Right? Could it be raised to 32 bits? I bet it would be a major
> change throughout different parts of the kernel.

It is probably 15 bits (since -ve pid numbers are used for process groups).

My guess is that userspace and the system call interface will handle 32bit
(signed) pid numbers.
(I don't remember 'linux emulation' being one of the emulations that
would truncate 32bit pids when one of the BDSs went to 32bit pids.)
The main problem will be that big numbers will mess up the alignment
of printouts from ps and top (etc).
This can be mitigated by only allocating 'big' numbers on systems
that have a lot of pids.
You also really want an O(1) allocator.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH v10 12/22] kasan, arm64: fix up fault handling logic

2018-11-08 Thread Mark Rutland

On Tue, Nov 06, 2018 at 06:30:27PM +0100, Andrey Konovalov wrote:
> show_pte in arm64 fault handling relies on the fact that the top byte of
> a kernel pointer is 0xff, which isn't always the case with tag-based
> KASAN.

That's for the TTBR1 check, right?

i.e. for the following to work:

if (addr >= VA_START)

... we need the tag bits to be an extension of bit 55...

> 
> This patch resets the top byte in show_pte.
> 
> Reviewed-by: Andrey Ryabinin 
> Reviewed-by: Dmitry Vyukov 
> Signed-off-by: Andrey Konovalov 
> ---
>  arch/arm64/mm/fault.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 7d9571f4ae3d..d9a84d6f3343 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -141,6 +142,8 @@ void show_pte(unsigned long addr)
>   pgd_t *pgdp;
>   pgd_t pgd;
>  
> + addr = (unsigned long)kasan_reset_tag((void *)addr);

... but this ORs in (0xffUL << 56), which is not correct for addresses
which aren't TTBR1 addresses to begin with, where bit 55 is clear, and
throws away useful information.

We could use untagged_addr() here, but that wouldn't be right for
kernels which don't use TBI1, and we'd erroneously report addresses
under the TTBR1 range as being in the TTBR1 range.

I also see that the entry assembly for el{1,0}_{da,ia} clears the tag
for EL0 addresses.

So we could have:

static inline bool is_ttbr0_addr(unsigned long addr)
{
/* entry assembly clears tags for TTBR0 addrs */
return addr < TASK_SIZE_64;
}

static inline bool is_ttbr1_addr(unsigned long addr)
{
/* TTBR1 addresses may have a tag if HWKASAN is in use */
return arch_kasan_reset_tag(addr) >= VA_START;
}

... and use those in the conditionals, leaving the addr as-is for
reporting purposes.

Thanks,
Mark.

Re: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread Matthew Wilcox

On Thu, Nov 08, 2018 at 12:02:49PM +, David Laight wrote:
> From: Martin Steigerwald
> > Sent: 07 November 2018 17:05
> ...
> > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > change throughout different parts of the kernel.
> 
> It is probably 15 bits (since -ve pid numbers are used for process groups).
> 
> My guess is that userspace and the system call interface will handle 32bit
> (signed) pid numbers.
> (I don't remember 'linux emulation' being one of the emulations that
> would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> The main problem will be that big numbers will mess up the alignment
> of printouts from ps and top (etc).
> This can be mitigated by only allocating 'big' numbers on systems
> that have a lot of pids.
> You also really want an O(1) allocator.

The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
n in this case is the highest ID which is still in use.  The tree is
log_64(n) levels high.  It walks to the bottom of the tree and puts a
pointer into the tree.  If the cursor has wrapped to the beginning of
the tree, it may encounter a PID which is still in use; if it does,
it does a bitmap scan of that node, and will then walk up the tree,
doing a bitmap scan forward at each level until it finds a free PID.

So it's not exactly O(log(n)), but it's close enough for all practical
purposes.  And more importantly, it doesn't touch a lot of cachelines.
Two or three at each level of the tree that it accesses.  If we went
all the way to a 32-bit PID, the tree would grow to 6 levels deep,
and worst-case would touch 6 + 5 + 4 levels of the tree (starting with
trying to allocate PID 0x, failing, trying to allocate PID 300,
then having to walk all the way forward to find PID 0xe000), so
that's only 45 cachelines.

People care a little too much about O(1)/O(n) behaviour.  Cacheline
behaviour, and good average-case performance without falling off a cliff
in the worst case is much more important.

Re: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread Michal Hocko

On Wed 07-11-18 18:04:59, Martin Steigerwald wrote:
> Michal Hocko - 07.11.18, 17:00:
> > > > otherwise anybody could simply DoS the system
> > > > by consuming all available pids.
> > > 
> > > People can do that today using the instrument of terror widely known
> > > as fork(2). The only thing standing between fork(2) and a full
> > > process table is RLIMIT_NPROC.
> > 
> > not really. If you really do care about pid space depletion then you
> > should use pid cgroup controller.
> 
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16 
> bits. Right? Could it be raised to 32 bits? I bet it would be a major 
> change throughout different parts of the kernel.
> 
> 16 bits sound a bit low these days, not only for PIDs, but also for 
> connections / ports.

Do you have any specific example of the pid space exhaustion? Well
except for a fork bomb attacks that could be mitigated by the pid cgroup
controller.
-- 
Michal Hocko
SUSE Labs

RE: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread David Laight

From: Matthew Wilcox
> Sent: 08 November 2018 12:28
>
> On Thu, Nov 08, 2018 at 12:02:49PM +, David Laight wrote:
> > From: Martin Steigerwald
> > > Sent: 07 November 2018 17:05
> > ...
> > > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > > change throughout different parts of the kernel.
> >
> > It is probably 15 bits (since -ve pid numbers are used for process groups).
> >
> > My guess is that userspace and the system call interface will handle 32bit
> > (signed) pid numbers.
> > (I don't remember 'linux emulation' being one of the emulations that
> > would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> > The main problem will be that big numbers will mess up the alignment
> > of printouts from ps and top (etc).
> > This can be mitigated by only allocating 'big' numbers on systems
> > that have a lot of pids.
> > You also really want an O(1) allocator.
> 
> The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> n in this case is the highest ID which is still in use.  The tree is
> log_64(n) levels high.  It walks to the bottom of the tree and puts a
> pointer into the tree.  If the cursor has wrapped to the beginning of
> the tree, it may encounter a PID which is still in use; if it does,
> it does a bitmap scan of that node, and will then walk up the tree,
> doing a bitmap scan forward at each level until it finds a free PID.

Right, but you can choose the pid so that you get a perfect hash.
You can then put a FIFO free list through the unused entries of
the hash table (just an array).
Then pid allocate just picks the oldest free entry and ups the
high bits (that the hash masks out) to make the old value stale.
Almost no cache lines are involved in the whole operation.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread Matthew Wilcox

On Thu, Nov 08, 2018 at 01:42:41PM +, David Laight wrote:
> From: Matthew Wilcox
> > On Thu, Nov 08, 2018 at 12:02:49PM +, David Laight wrote:
> > > This can be mitigated by only allocating 'big' numbers on systems
> > > that have a lot of pids.
> > > You also really want an O(1) allocator.
> > 
> > The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> > n in this case is the highest ID which is still in use.  The tree is
> > log_64(n) levels high.  It walks to the bottom of the tree and puts a
> > pointer into the tree.  If the cursor has wrapped to the beginning of
> > the tree, it may encounter a PID which is still in use; if it does,
> > it does a bitmap scan of that node, and will then walk up the tree,
> > doing a bitmap scan forward at each level until it finds a free PID.
> 
> Right, but you can choose the pid so that you get a perfect hash.
> You can then put a FIFO free list through the unused entries of
> the hash table (just an array).
> Then pid allocate just picks the oldest free entry and ups the
> high bits (that the hash masks out) to make the old value stale.
> Almost no cache lines are involved in the whole operation.

You'd be looking for something like dynamic perfect hashing then?
I think that'd make a great project for someone to try out.  I don't
have the time to look into it myself, and I'm not convinced there's
a real workload that would benefit.  But it'd be a great project for
a university student.

RE: [PATCH v2] Document /proc/pid PID reuse behavior

2018-11-08 Thread David Laight

From: Matthew Wilcox
> Sent: 08 November 2018 14:07
...
> 
> You'd be looking for something like dynamic perfect hashing then?
> I think that'd make a great project for someone to try out.  I don't
> have the time to look into it myself, and I'm not convinced there's
> a real workload that would benefit.  But it'd be a great project for
> a university student.

Been there, done that several times.
Half an afternoon's work.

I'm surprised there isn't a C++ container class that will give
you a 'id' for a pointer (or other large value).
But they all do it the other way around.

It is also trivial to double the size of the array and 'unzip'
the allocated items into their correct halves while adding the
other entry to the free list.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH v8 4/8] mm, arm64: untag user addresses in mm/gup.c

2018-11-08 Thread Andrey Konovalov

mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Since a user can provided tagged addresses, we need
to handle such case.

Add untagging to gup.c functions that use user addresses for vma lookup.

Signed-off-by: Andrey Konovalov 
---
 mm/gup.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index f76e77a2d34b..6ff310818616 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -677,6 +677,8 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (!nr_pages)
return 0;
 
+   start = untagged_addr(start);
+
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
 
/*
@@ -840,6 +842,8 @@ int fixup_user_fault(struct task_struct *tsk, struct 
mm_struct *mm,
struct vm_area_struct *vma;
vm_fault_t ret, major = 0;
 
+   address = untagged_addr(address);
+
if (unlocked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
 
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 5/8] lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user

2018-11-08 Thread Andrey Konovalov

strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.

Untag user pointers passed to these functions.

Signed-off-by: Andrey Konovalov 
---
 lib/strncpy_from_user.c | 2 ++
 lib/strnlen_user.c  | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index b53e1b5d80f4..97467cd2bc59 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -106,6 +106,8 @@ long strncpy_from_user(char *dst, const char __user *src, 
long count)
if (unlikely(count <= 0))
return 0;
 
+   src = untagged_addr(src);
+
max_addr = user_addr_max();
src_addr = (unsigned long)src;
if (likely(src_addr < max_addr)) {
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 60d0bbda8f5e..8b5f56466e00 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -108,6 +108,8 @@ long strnlen_user(const char __user *str, long count)
if (unlikely(count <= 0))
return 0;
 
+   str = untagged_addr(str);
+
max_addr = user_addr_max();
src_addr = (unsigned long)str;
if (likely(src_addr < max_addr)) {
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 8/8] selftests, arm64: add a selftest for passing tagged pointers to kernel

2018-11-08 Thread Andrey Konovalov

This patch adds a simple test, that calls the uname syscall with a
tagged user pointer as an argument. Without the kernel accepting tagged
user pointers the test fails with EFAULT.

Signed-off-by: Andrey Konovalov 
---
 tools/testing/selftests/arm64/.gitignore  |  1 +
 tools/testing/selftests/arm64/Makefile| 11 +++
 .../testing/selftests/arm64/run_tags_test.sh  | 12 
 tools/testing/selftests/arm64/tags_test.c | 19 +++
 4 files changed, 43 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/.gitignore
 create mode 100644 tools/testing/selftests/arm64/Makefile
 create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
 create mode 100644 tools/testing/selftests/arm64/tags_test.c

diff --git a/tools/testing/selftests/arm64/.gitignore 
b/tools/testing/selftests/arm64/.gitignore
new file mode 100644
index ..e8fae8d61ed6
--- /dev/null
+++ b/tools/testing/selftests/arm64/.gitignore
@@ -0,0 +1 @@
+tags_test
diff --git a/tools/testing/selftests/arm64/Makefile 
b/tools/testing/selftests/arm64/Makefile
new file mode 100644
index ..a61b2e743e99
--- /dev/null
+++ b/tools/testing/selftests/arm64/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# ARCH can be overridden by the user for cross compiling
+ARCH ?= $(shell uname -m 2>/dev/null || echo not)
+
+ifneq (,$(filter $(ARCH),aarch64 arm64))
+TEST_GEN_PROGS := tags_test
+TEST_PROGS := run_tags_test.sh
+endif
+
+include ../lib.mk
diff --git a/tools/testing/selftests/arm64/run_tags_test.sh 
b/tools/testing/selftests/arm64/run_tags_test.sh
new file mode 100755
index ..745f11379930
--- /dev/null
+++ b/tools/testing/selftests/arm64/run_tags_test.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+echo ""
+echo "running tags test"
+echo ""
+./tags_test
+if [ $? -ne 0 ]; then
+   echo "[FAIL]"
+else
+   echo "[PASS]"
+fi
diff --git a/tools/testing/selftests/arm64/tags_test.c 
b/tools/testing/selftests/arm64/tags_test.c
new file mode 100644
index ..1452ed7d33f9
--- /dev/null
+++ b/tools/testing/selftests/arm64/tags_test.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+
+#define SHIFT_TAG(tag) ((uint64_t)(tag) << 56)
+#define SET_TAG(ptr, tag)  (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \
+   SHIFT_TAG(tag))
+
+int main(void)
+{
+   struct utsname utsname;
+   void *ptr = &utsname;
+   void *tagged_ptr = (void *)SET_TAG(ptr, 0x42);
+   int err = uname(tagged_ptr);
+   return err;
+}
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 0/8] arm64: untag user pointers passed to the kernel

2018-11-08 Thread Andrey Konovalov

arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.

Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:

1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")

When passing tagged pointers to syscalls, there's a special case of such a
pointer being passed to one of the memory syscalls (mmap, mprotect, etc.).
These syscalls don't do memory accesses but rather deal with memory
ranges, hence an untagged pointer is better suited.

This patchset extends tagged pointer support to non-memory syscalls. This
is done by reusing the untagged_addr macro to untag user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel.

One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.

The following testing approaches has been taken to find potential issues
with user pointer untagging:

1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.

2. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.

Based on the results of the testing the requried patches have been added
to the patchset.

This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.

This patchset is a prerequisite for ARM's memory tagging hardware feature
support [3].

Thanks!

[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

[2]
https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292

[3]
https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a

Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.

Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.

Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).

Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).

Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).

Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.

Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.

Changes in v1:
- Rebased onto 4.17-rc1.

Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).

Reviewed-by: Luc Van Oostenryck
Signed-off-by: Andrey Konovalov

Andrey Konovalov (8):
arm64: add type casts to untagged_addr macro
uaccess: add untagged_addr definition for other arches
arm64: untag user addresses in access_ok and __uaccess_mask_ptr
mm, arm64: untag user addresses in mm/gup.c
lib, arm64: untag addrs passed to strnc

[PATCH v8 6/8] fs, arm64: untag user address in copy_mount_options

2018-11-08 Thread Andrey Konovalov

In copy_mount_options a user address is being subtracted from TASK_SIZE.
If the address is lower than TASK_SIZE, the size is calculated to not
allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
However if the address is tagged, then the size will be calculated
incorrectly.

Untag the address before subtracting.

Signed-off-by: Andrey Konovalov 
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 98d27da43304..1f1f998d15ed 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2674,7 +2674,7 @@ void *copy_mount_options(const void __user * data)
 * the remainder of the page.
 */
/* copy_from_user cannot cross TASK_SIZE ! */
-   size = TASK_SIZE - (unsigned long)data;
+   size = TASK_SIZE - (unsigned long)untagged_addr(data);
if (size > PAGE_SIZE)
size = PAGE_SIZE;
 
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 3/8] arm64: untag user addresses in access_ok and __uaccess_mask_ptr

2018-11-08 Thread Andrey Konovalov

copy_from_user (and a few other similar functions) are used to copy data
from user memory into the kernel memory or vice versa. Since a user can
provided a tagged pointer to one of the syscalls that use copy_from_user,
we need to correctly handle such pointers.

Do this by untagging user pointers in access_ok and in __uaccess_mask_ptr,
before performing access validity checks.

Signed-off-by: Andrey Konovalov 
---
 arch/arm64/include/asm/uaccess.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index c1325271e368..abc35cba134b 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -104,7 +104,8 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
 #define untagged_addr(addr)\
((__typeof__(addr))sign_extend64((__u64)(addr), 55))
 
-#define access_ok(type, addr, size)__range_ok(addr, size)
+#define access_ok(type, addr, size)\
+   __range_ok(untagged_addr(addr), size)
 #define user_addr_max  get_fs
 
 #define _ASM_EXTABLE(from, to) \
@@ -236,7 +237,8 @@ static inline void uaccess_enable_not_uao(void)
 
 /*
  * Sanitise a uaccess pointer such that it becomes NULL if above the
- * current addr_limit.
+ * current addr_limit. In case the pointer is tagged (has the top byte set),
+ * untag the pointer before checking.
  */
 #define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr)
 static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
@@ -244,10 +246,11 @@ static inline void __user *__uaccess_mask_ptr(const void 
__user *ptr)
void __user *safe_ptr;
 
asm volatile(
-   "   bicsxzr, %1, %2\n"
+   "   bicsxzr, %3, %2\n"
"   csel%0, %1, xzr, eq\n"
: "=&r" (safe_ptr)
-   : "r" (ptr), "r" (current_thread_info()->addr_limit)
+   : "r" (ptr), "r" (current_thread_info()->addr_limit),
+ "r" (untagged_addr(ptr))
: "cc");
 
csdb();
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 7/8] arm64: update Documentation/arm64/tagged-pointers.txt

2018-11-08 Thread Andrey Konovalov

Document the changes in Documentation/arm64/tagged-pointers.txt.

Signed-off-by: Andrey Konovalov 
---
 Documentation/arm64/tagged-pointers.txt | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/Documentation/arm64/tagged-pointers.txt 
b/Documentation/arm64/tagged-pointers.txt
index a25a99e82bb1..f4cf1f5cf362 100644
--- a/Documentation/arm64/tagged-pointers.txt
+++ b/Documentation/arm64/tagged-pointers.txt
@@ -17,13 +17,22 @@ this byte for application use.
 Passing tagged addresses to the kernel
 --
 
-All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
+The kernel supports tags in pointer arguments (including pointers in
+structures) for a limited set of syscalls, the exceptions are:
 
-This includes, but is not limited to, addresses found in:
+ - memory syscalls: brk, madvise, mbind, mincore, mlock, mlock2, move_pages,
+   mprotect, mremap, msync, munlock, munmap, pkey_mprotect, process_vm_readv,
+   process_vm_writev, remap_file_pages;
 
- - pointer arguments to system calls, including pointers in structures
-   passed to system calls,
+ - ioctls that accept user pointers that describe virtual memory ranges;
+
+ - TCP_ZEROCOPY_RECEIVE setsockopt.
+
+The kernel supports tags in user fault addresses. However the fault_address
+field in the sigcontext struct will contain an untagged address.
+
+All other interpretations of userspace memory addresses by the kernel
+assume an address tag of 0x00, in particular:
 
  - the stack pointer (sp), e.g. when interpreting it to deliver a
signal,
@@ -33,11 +42,7 @@ This includes, but is not limited to, addresses found in:
 
 Using non-zero address tags in any of these locations may result in an
 error code being returned, a (fatal) signal being raised, or other modes
-of failure.
-
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
+of failure. Using a non-zero address tag for sp is strongly discouraged.
 
 Programs maintaining a frame pointer and frame records that use non-zero
 address tags may suffer impaired or inaccurate debug and profiling
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 2/8] uaccess: add untagged_addr definition for other arches

2018-11-08 Thread Andrey Konovalov

To allow arm64 syscalls accept tagged pointers from userspace, we must
untag them when they are passed to the kernel. Since untagging is done in
generic parts of the kernel, the untagged_addr macro needs to be defined
for all architectures.

Define it as a noop for other architectures besides arm64.

Signed-off-by: Andrey Konovalov 
---
 include/linux/uaccess.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index efe79c1cdd47..c045b4eff95e 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -13,6 +13,10 @@
 
 #include 
 
+#ifndef untagged_addr
+#define untagged_addr(addr) addr
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
-- 
2.19.1.930.g4563a0d9d0-goog

[PATCH v8 1/8] arm64: add type casts to untagged_addr macro

2018-11-08 Thread Andrey Konovalov

This patch makes the untagged_addr macro accept all kinds of address types
(void *, unsigned long, etc.) and allows not to specify type casts in each
place where it is used. This is done by using __typeof__.

Signed-off-by: Andrey Konovalov 
---
 arch/arm64/include/asm/uaccess.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 07c34087bd5e..c1325271e368 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -101,7 +101,8 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
  * up with a tagged userland pointer. Clear the tag to get a sane pointer to
  * pass on to access_ok(), for instance.
  */
-#define untagged_addr(addr)sign_extend64(addr, 55)
+#define untagged_addr(addr)\
+   ((__typeof__(addr))sign_extend64((__u64)(addr), 55))
 
 #define access_ok(type, addr, size)__range_ok(addr, size)
 #define user_addr_max  get_fs
-- 
2.19.1.930.g4563a0d9d0-goog

Re: [PATCH v15 23/23] x86/sgx: Driver documentation

2018-11-08 Thread Jarkko Sakkinen

On Wed, Nov 07, 2018 at 09:09:37AM -0800, Dave Hansen wrote:
> On 11/7/18 8:30 AM, Jarkko Sakkinen wrote:
> >> Does this code run when I type "make kselftest"?  If not, I think we
> >> should rectify that.
> > No, it doesn't. It is just my backup for the non-SDK user space code
> > that I've made that I will use to fork my user space SGX projects in
> > the future.
> > 
> > I can work-out a selftest (and provide a new patch in the series) but
> > I'm still wondering what the enclave should do. I would suggest that
> > we start with an enclave that does just EEXIT and nothing else.
> 
> Yeah, that's a start.  But, a good selftest would include things like
> faults and error conditions.

Great. We can add more entry points to the enclave for different tests
but I'll start with a bare minimum. And yeah but ever goes into next
version I'll document the fault handling.

/Jarkko

Re: [PATCH v3 1/2] kretprobe: produce sane stack traces

2018-11-08 Thread Josh Poimboeuf

On Thu, Nov 08, 2018 at 07:04:48PM +1100, Aleksa Sarai wrote:
> On 2018-11-08, Aleksa Sarai  wrote:
> > I will attach what I have at the moment to hopefully explain what the
> > issue I've found is (re-using the kretprobe architecture but with the
> > shadow-stack idea).
> 
> Here is the patch I have at the moment (it works, except for the
> question I have about how to handle the top-level pt_regs -- I've marked
> that code with XXX).
> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> 
> 
> --8<-
> 
> Since the return address is modified by kretprobe, the various unwinders
> can produce invalid and confusing stack traces. ftrace mostly solved
> this problem by teaching each unwinder how to find the original return
> address for stack trace purposes. This same technique can be applied to
> kretprobes by simply adding a pointer to where the return address was
> replaced in the stack, and then looking up the relevant
> kretprobe_instance when a stack trace is requested.
> 
> [WIP: This is currently broken because the *first entry* will not be
>   overwritten since it looks like the stack pointer is different
>   when we are provided pt_regs. All other addresses are correctly
>   handled.]

When you see this problem, what does regs->ip point to?  If it's
pointing to generated code, then we don't _currently_ have a way of
dealing with that.  If it's pointing to a real function, we can fix that
with unwind hints.

-- 
Josh

Re: [PATCH v8 0/8] arm64: untag user pointers passed to the kernel

2018-11-08 Thread Andrey Konovalov

On Thu, Nov 8, 2018 at 3:36 PM, Andrey Konovalov  wrote:

[...]

> Changes in v8:
> - Rebased onto 65102238 (4.20-rc1).
> - Added a note to the cover letter on why syscall wrappers/shims that untag
>   user pointers won't work.
> - Added a note to the cover letter that this patchset has been merged into
>   the Pixel 2 kernel tree.
> - Documentation fixes, in particular added a list of syscalls that don't
>   support tagged user pointers.

Hi Catalin,

I've changed the documentation to be more specific, please take a look.

I haven't done anything about adding a way for the user to find out
that the kernel supports this ABI extension. I don't know what would
the the preferred way to do this, and we haven't received any comments
on that from anybody else. Probing "on some innocuous syscall
currently returning -EFAULT on tagged pointer arguments" works though,
as you mentioned.

As mentioned in the cover letter, this patchset has been merged into
the Pixel 2 kernel tree.

Thanks!

82 matches

Mail list logo