On Fri, Jul 03 2026 at 08:26, Sven Schnelle wrote:
> Thomas Gleixner <[email protected]> writes:
>> It's less than obvious and I have no objections to clean that up and
>> make it more intuitive, but I still fail to see what Michal is actually
>> trying to solve and what the magic flag is for. If s390 requires it,
>> then that's an s390 problem, but definitely x86 does not.
>
> The difference between x86 and s390 is that on s390, regs->gprs[2] is
> used for both the syscall number and the syscall return value.
> That was a design mistake early in the begin about 25 years ago, but
> it's ABI now, so it cannot be changed.

Cute.

> When seccomp decides to skip a syscall, it write a return value into
> regs->gprs[2]. When syscall_enter_from_user_mode_work() returns, it
> returns this number. If it's negative all is good - the 'if (likely(nr <
> NR_syscalls))' conditiion would just catch it and skip the syscall.
>
> But if it's a positive number, the code cannot distinguish whether
> that's a return value or a syscall number.
>
> So I introduced PIF_SYSCALL_RET_SET when converting s390 to generic
> entry. This flag tells the syscall code that a return value was set in
> ptregs and the syscall should be skipped.

You also could have added a 'syscall_ret' member to pt_regs, operate
on that for the return values (seccomp, syscall...) and swap it into
gprs[2] right before returning to user space.

> I'd like to see something like the change from Michal going in - cleaned
> up of course. It would allow us to get rid of PIF_SYSCALL_RET_SET.

I have no objections against cleaning it up and making it less
convoluted.

Reply via email to