On 03/24/2015 10:40 PM, Andy Lutomirski wrote: > The syscall and sysenter stuff is IMO really nasty. Here's how I'd > like it to work: > > When you do "call __kernel_vsyscall", I want the net effect to be that > your eax, ebx, ecx, edx, esi, edi, and ebp at the time of the call end > up *verbatim* in pt_regs. Your eip and rsp should be such that, if we > iret normally using pt_regs, we end up returning correctly to > userspace. I want this to be true *regardless* of whether we're doing > a fast-path or slow-path system call. > > This means that we have, literally (see below for why ret $4): > > int $0x80 > ret $4 <-- regs->eip points here > > Then we add an opportunistic return trampoline: if a special ti flag > is set (which we set on entry here) and the return eip and regs are > appropriate, then we change the return at the last minute to vdso code > that looks like: > > popl $ecx > popl $edx > ret
I don't fully understand your intent. > The vdso code would be something like (so untested it's not even funny): > > __kernel_vsyscall: > ALTERNATIVE_2(something or other) > > __kernel_vsyscall_for_intel: > pushl $edx > pushl $ecx > sysenter > hlt <-- just for clarity > > __kernel_vsyscall_for_amd: > pushl $ecx > syscall > __vsyscall_after_syscall_insn: > ret $4 <-- for binary tracers only This ret would use former ecx value as return address? > __kernel_vsyscall_for_int80: > int $0x80 <-- regs->eip points here during *all* vsyscalls > > __kernel_vsyscall_slow_ret: > ret $4 After returning, this will pop an extra word from __kernel_vsyscall() caller. They don't expect that. > __kernel_vsyscall_sysretl_target: > popl $ecx > ret > > There is no sysexit. Take that, Intel. > > On sysenter, we copy regs->cx and regs->dx from user memory and then > we increment regs->sp by 4 and point regs->eip to > __kernel_vsyscall_for_int80. On syscall, we copy regs->cx from user > memory and point regs->eip to __kernel_vsyscall_for_int80. > > On opportunistic sysretl, we do: > > *regs->sp = regs->cx; /* put_user or whatever */ > regs->eip = __kernel_vsyscall_sysretl_target > ... > sysretl > > We never do sysexit or sysretl in any other code path. That is, there > is no really fast path anymore. I still don't understand the purpose those "ret 4" insns. They don't look right. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/