Suppose a 64-bit task A traces a 32-bit task B. B makes a syscall that uses ERESTART_RESTARTBLOCK and gets a signal. A catches syscall exit, snapshots B's regs, changes the regs, and resumes. Then A restores the snapshot of B's regs.
A expects B to resume where it left off when snapshotted, which means that B should execute sys_restart_block(). We have a big set of hacks in place that make this work in some cases, but in this particular case, the hacks fall apart. When A restores regs, because A is 64-bit, ptrace() will *not* set TS_I386_REGS_POKED. But, because B isn't stopped in a syscall, TS_COMPAT won't be set either. As a result, the current code will set nr=219 (the 64-bit __NR_restart_syscall) and then promptly invoke madvise() (the 32-bit syscall with nr==219). This patch fixes the bug and simplifies the code simultaneous by simply wiring up nr==380 to refer to sys_restart_block() in all cases. Cc: Pedro Alves <pal...@redhat.com> Cc: Oleg Nesterov <o...@redhat.com> Cc: Kees Cook <keesc...@chromium.org> Signed-off-by: Andy Lutomirski <l...@kernel.org> --- arch/x86/entry/syscalls/syscall_32.tbl | 2 ++ arch/x86/entry/syscalls/syscall_64.tbl | 3 +++ arch/x86/kernel/signal.c | 34 ++++++++-------------------------- 3 files changed, 13 insertions(+), 26 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cddd17153fb..9fc9272d0c41 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -386,3 +386,5 @@ 377 i386 copy_file_range sys_copy_file_range 378 i386 preadv2 sys_preadv2 compat_sys_preadv2 379 i386 pwritev2 sys_pwritev2 compat_sys_pwritev2 +# Same as restart_syscall but with the same nr for all ABIs. +380 i386 restart_syscall2 sys_restart_syscall diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 555263e385c9..f074952eaad8 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -336,6 +336,9 @@ 327 64 preadv2 sys_preadv2 328 64 pwritev2 sys_pwritev2 +# Same as restart_syscall but with the same nr for all ABIs. +380 common restart_syscall2 sys_restart_syscall + # # x32-specific system call numbers start at 512 to avoid cache impact # for native 64-bit operation. diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 6b952e1d8db8..b9f5133867f3 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -761,35 +761,17 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs) static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs) { /* - * This function is fundamentally broken as currently - * implemented. - * - * The idea is that we want to trigger a call to the - * restart_block() syscall and that we want in_ia32_syscall(), - * in_x32_syscall(), etc. to match whatever they were in the - * syscall being restarted. We assume that the syscall - * instruction at (regs->ip - 2) matches whatever syscall - * instruction we used to enter in the first place. - * - * The problem is that we can get here when ptrace pokes - * syscall-like values into regs even if we're not in a syscall - * at all. - * - * For now, we maintain historical behavior and guess based on - * stored state. We could do better by saving the actual - * syscall arch in restart_block or (with caveats on x32) by - * checking if regs->ip points to 'int $0x80'. The current - * behavior is incorrect if a tracer has a different bitness - * than the tracee. + * We can't reliably distinguish between restarting an i386 + * syscall and an x86_64 syscall without decoding the + * instruction at regs->ip -= 2. Rather than trying, restart + * using __NR_restart_syscall2, which works regardless of ABI. + * We still want to preserve the x32 state, but we can do that + * directly. */ -#ifdef CONFIG_IA32_EMULATION - if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED)) - return __NR_ia32_restart_syscall; -#endif #ifdef CONFIG_X86_X32_ABI - return __NR_restart_syscall | (regs->orig_ax & __X32_SYSCALL_BIT); + return __NR_restart_syscall2 | (regs->orig_ax & __X32_SYSCALL_BIT); #else - return __NR_restart_syscall; + return __NR_restart_syscall2; #endif } -- 2.5.5