On Thu, Sep 10, 2015 at 12:04 PM, Denys Vlasenko <dvlas...@redhat.com> wrote:
> On 09/10/2015 12:01 AM, Andy Lutomirski wrote:
>> On Mon, Sep 7, 2015 at 8:56 AM, Denys Vlasenko <dvlas...@redhat.com> wrote:
>>> This new test checks that all x86 registers are preserved across
>>> 32-bit syscalls. It tests syscalls through VDSO (if available)
>>> and through INT 0x80, normally and under ptrace.
>>>
>>> If kernel is a 64-bit one, high registers (r8..r15) are poisoned
>>> before the syscall is called and are checked afterwards.
>>>
>>> They must be either preserved, or cleared to zero (but r11 is special);
>>> r12..15 must be preserved for INT 0x80.
>>>
>>> EFLAGS is checked for changes too, but change there is not
>>> considered to be a bug (paravirt kernels do not preserve
>>> arithmetic flags).
>>>
>>> Run-tested on 64-bit kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [NOTE]  R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Running tests under ptrace
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>>
>>> On 32-bit paravirt kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [NOTE]  Not a 64-bit kernel, won't test R8..R15 leaks
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
>>> [WARN]  Flags  after=0000000000200246 id 0 00 i z 0 0 p 1
>>> [WARN]  Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Running tests under ptrace
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>>
>>> Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
>>> CC: Linus Torvalds <torva...@linux-foundation.org>
>>> CC: Steven Rostedt <rost...@goodmis.org>
>>> CC: Ingo Molnar <mi...@kernel.org>
>>> CC: Borislav Petkov <b...@alien8.de>
>>> CC: "H. Peter Anvin" <h...@zytor.com>
>>> CC: Andy Lutomirski <l...@amacapital.net>
>>> CC: Oleg Nesterov <o...@redhat.com>
>>> CC: Frederic Weisbecker <fweis...@gmail.com>
>>> CC: Alexei Starovoitov <a...@plumgrid.com>
>>> CC: Will Drewry <w...@chromium.org>
>>> CC: Kees Cook <keesc...@chromium.org>
>>> CC: x...@kernel.org
>>> CC: linux-kernel@vger.kernel.org
>>
>> Acked-by: Andy Lutomirski <l...@kernel.org>
>>
>> with minor caveats below, none of which are show-stoppers...
>>
>>> +                       /* INT80 syscall entrypoint can be used by
>>> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
>>> +                        * Therefore it must preserve R12+
>>> +                        * (they are callee-saved registers in 64-bit C 
>>> ABI).
>>> +                        *
>>> +                        * This was probably historically not intended,
>>> +                        * but R8..11 are clobbered (cleared to 0).
>>> +                        * IOW: they are the only registers which aren't
>>> +                        * preserved across INT80 syscall.
>>> +                        */
>>> +                       if (*r64 == 0 && num <= 11)
>>> +                               continue;
>>
>> Ugh.  I'll change my big entry patchset to preserve these and maybe to
>> preserve all of the 64-bit regs.
>
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.
>
> If you do that, the test won't fail. The code above does
> not require regs to be 0 - there is further code which
> also allow them to be unchanged.
>
> (I'm not very comfortable about additional six push/pops
> which are necessary for this to happen. I'm surprised
> maintainers tentatively agreed to that -
> I was grilled and asked to prove with measurements
> that *one* additional push+pop wasn't adding significant overhead).

I suspect that I need to make the series faster.

Also, int $0x80 isn't a fast path for any legitimate use case except
Debian, and I would argue that Debian is just buggy.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to