Michal Suchánek <[email protected]> writes:

> On Thu, Jul 02, 2026 at 02:01:02PM +0200, Sven Schnelle wrote:
>> Michal Suchánek <[email protected]> writes:
>> 
>> > On Thu, Jul 02, 2026 at 10:12:35AM +0200, Sven Schnelle wrote:
>> >> Michal Suchánek <[email protected]> writes:
>> >> 
>> >> > The return value of syscall_enter_from_user_mode is used both for the
>> >> > adjusted syscall number and the indicator that a syscall should be
>> >> > skipped.
>> >> >
>> >> > As seccomp can be invoked on any syscall, including invalid ones this
>> >> > somewhat undermines seccomp.
>> >> >
>> >> > While the seccomp variants that terminate the process do not need to
>> >> > care about this for the filter that sets the syscall return value this
>> >> > disctinction is required.
>> >> >
>> >> > Pass the syscall number as a pointer to the inline entry functions, and
>> >> > use the return value exclusively for the indication that the syscall is
>> >> > already handled.
>> >> >
>> >> > This should avoid the need for the s390 PIF_SYSCALL_RET_SET which is the
>> >> > workaround for exactly this deficiency.
>> >> 
>> >> I'm not sure whether PIF_SYSCALL_RET_SET can be removed - the syscall
>> >> return might still get set by PTRACE_SET_SYSCALL_INFO when the tracee is
>> >> stopped. This might be a positive number which can't be distinguished
>> >> from a syscall number. But maybe i'm missing something? It's been quite
>> >> a while since I touched all that ptrace stuff.
>> >
>> > When the syscall return value is set (in the registers) the return value
>> > which is also the modified syscall number is set to -1 indicating the
>> > syscall was handled. At least that's how the API is described.
>> >
>> > So yes, if the syscall number range is restricted or the syscall number
>> > is returned through a path different from the function return value the
>> > flag should not be needed in the entry path because the case can be
>> > detected through the return value alone.
>> 
>> I'm still failing to see how this would work without an additional
>> flag. Assume a program (the tracee) is stopped because of a syscall
>> entry. The tracer then decides to skip the syscall and changes
>> regs->gpr2 (which contains either the syscall number or return value)
>> to contain 42. When the tracer than restarts the syscall, how does
>> do_syscall() know that gpr2 is now a return value and not a syscall number?
>
> Because then the return value from the syscall_enter_from_user_mode
> machinery would be -1 indicating the syscall should be skipped. That is
> how the return value of syscall_enter_from_user_mode is documented, I
> did not verify that it actually works that way for the tracing case on
> s390.

I read the code and tested - I think I confused the way how syscall are
intercepted by seccomp vs. ptrace. The PIF_SYSCALL_RET_SET is indeed
only required to indicate syscalls skipped via seccomp and not ptrace.

Reply via email to