Peter Zijlstra <[email protected]> writes:

> On Fri, Jul 03, 2026 at 11:59:07AM +0200, Sven Schnelle wrote:
>> Thomas Gleixner <[email protected]> writes:
>>
>> > On Fri, Jul 03 2026 at 08:26, Sven Schnelle wrote:
>> >> Thomas Gleixner <[email protected]> writes:
>> >>> It's less than obvious and I have no objections to clean that up and
>> >>> make it more intuitive, but I still fail to see what Michal is actually
>> >>> trying to solve and what the magic flag is for. If s390 requires it,
>> >>> then that's an s390 problem, but definitely x86 does not.
>> >>
>> >> The difference between x86 and s390 is that on s390, regs->gprs[2] is
>> >> used for both the syscall number and the syscall return value.
>> >> That was a design mistake early in the begin about 25 years ago, but
>> >> it's ABI now, so it cannot be changed.
>> >
>> > Cute.
>> >
>> >> When seccomp decides to skip a syscall, it write a return value into
>> >> regs->gprs[2]. When syscall_enter_from_user_mode_work() returns, it
>> >> returns this number. If it's negative all is good - the 'if (likely(nr <
>> >> NR_syscalls))' conditiion would just catch it and skip the syscall.
>> >>
>> >> But if it's a positive number, the code cannot distinguish whether
>> >> that's a return value or a syscall number.
>> >>
>> >> So I introduced PIF_SYSCALL_RET_SET when converting s390 to generic
>> >> entry. This flag tells the syscall code that a return value was set in
>> >> ptregs and the syscall should be skipped.
>> >
>> > You also could have added a 'syscall_ret' member to pt_regs, operate
>> > on that for the return values (seccomp, syscall...) and swap it into
>> > gprs[2] right before returning to user space.
>>
>> That would likely also work, but I found it easier to read and
>> understand to have an additional flag with a descriptive name than having
>> yet another 'somehow-related-to-gpr2' member in ptregs.
>
> I find this very odd; I would think that having both syscall-nr and
> syscall-ret in separate (virtual) registers for most of the normal cycle
> would be most obvious and less surprising -- given that this is what all
> other architectures do.
>
> Entry either grabs a copy of gpr2 and preserves it in orig_gpr2 as the
> syscall nr, or as Thomas suggests, you keep syscall_ret and copy that
> into gpr2 on return to userspace (and ptrace and signal and whatever
> other surface bits are affected).
>
> Either way around you then have separate values for the entire range of
> at least the C part of the kernel syscall handling -- just like every
> other arch. How is munging things in a single value and a flag easier?

Looks like we have different opinions on that - I find the flag way
easier, and we don't need additional space for a long in ptregs and copy
things around.

Reply via email to