Suppose a 64-bit task A traces a 32-bit task B.  B makes a syscall
that uses ERESTART_RESTARTBLOCK and gets a signal.  A catches
syscall exit, snapshots B's regs, changes the regs, and resumes.
Then A restores the snapshot of B's regs.

A expects B to resume where it left off when snapshotted, which means
that B should execute sys_restart_block().  We have a big set of hacks
in place that make this work in some cases, but in this particular
case, the hacks fall apart.  When A restores regs, because A is
64-bit, ptrace() will *not* set TS_I386_REGS_POKED.  But, because B
isn't stopped in a syscall, TS_COMPAT won't be set either.  As a
result, the current code will set nr=219 (the 64-bit
__NR_restart_syscall) and then promptly invoke madvise() (the 32-bit
syscall with nr==219).

This patch fixes the bug and simplifies the code simultaneous by
simply wiring up nr==380 to refer to sys_restart_block() in all cases.

Cc: Pedro Alves <pal...@redhat.com>
Cc: Oleg Nesterov <o...@redhat.com>
Cc: Kees Cook <keesc...@chromium.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl |  2 ++
 arch/x86/entry/syscalls/syscall_64.tbl |  3 +++
 arch/x86/kernel/signal.c               | 34 ++++++++--------------------------
 3 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cddd17153fb..9fc9272d0c41 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,5 @@
 377    i386    copy_file_range         sys_copy_file_range
 378    i386    preadv2                 sys_preadv2                     
compat_sys_preadv2
 379    i386    pwritev2                sys_pwritev2                    
compat_sys_pwritev2
+# Same as restart_syscall but with the same nr for all ABIs.
+380    i386    restart_syscall2        sys_restart_syscall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 555263e385c9..f074952eaad8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -336,6 +336,9 @@
 327    64      preadv2                 sys_preadv2
 328    64      pwritev2                sys_pwritev2
 
+# Same as restart_syscall but with the same nr for all ABIs.
+380    common  restart_syscall2        sys_restart_syscall
+
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
 # for native 64-bit operation.
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 6b952e1d8db8..b9f5133867f3 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -761,35 +761,17 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
 {
        /*
-        * This function is fundamentally broken as currently
-        * implemented.
-        *
-        * The idea is that we want to trigger a call to the
-        * restart_block() syscall and that we want in_ia32_syscall(),
-        * in_x32_syscall(), etc.  to match whatever they were in the
-        * syscall being restarted.  We assume that the syscall
-        * instruction at (regs->ip - 2) matches whatever syscall
-        * instruction we used to enter in the first place.
-        *
-        * The problem is that we can get here when ptrace pokes
-        * syscall-like values into regs even if we're not in a syscall
-        * at all.
-        *
-        * For now, we maintain historical behavior and guess based on
-        * stored state.  We could do better by saving the actual
-        * syscall arch in restart_block or (with caveats on x32) by
-        * checking if regs->ip points to 'int $0x80'.  The current
-        * behavior is incorrect if a tracer has a different bitness
-        * than the tracee.
+        * We can't reliably distinguish between restarting an i386
+        * syscall and an x86_64 syscall without decoding the
+        * instruction at regs->ip -= 2.  Rather than trying, restart
+        * using __NR_restart_syscall2, which works regardless of ABI.
+        * We still want to preserve the x32 state, but we can do that
+        * directly.
         */
-#ifdef CONFIG_IA32_EMULATION
-       if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
-               return __NR_ia32_restart_syscall;
-#endif
 #ifdef CONFIG_X86_X32_ABI
-       return __NR_restart_syscall | (regs->orig_ax & __X32_SYSCALL_BIT);
+       return __NR_restart_syscall2 | (regs->orig_ax & __X32_SYSCALL_BIT);
 #else
-       return __NR_restart_syscall;
+       return __NR_restart_syscall2;
 #endif
 }
 
-- 
2.5.5

Reply via email to