On Fri, Jul 03 2026 at 08:26, Sven Schnelle wrote: > Thomas Gleixner <[email protected]> writes: >> It's less than obvious and I have no objections to clean that up and >> make it more intuitive, but I still fail to see what Michal is actually >> trying to solve and what the magic flag is for. If s390 requires it, >> then that's an s390 problem, but definitely x86 does not. > > The difference between x86 and s390 is that on s390, regs->gprs[2] is > used for both the syscall number and the syscall return value. > That was a design mistake early in the begin about 25 years ago, but > it's ABI now, so it cannot be changed.
Cute. > When seccomp decides to skip a syscall, it write a return value into > regs->gprs[2]. When syscall_enter_from_user_mode_work() returns, it > returns this number. If it's negative all is good - the 'if (likely(nr < > NR_syscalls))' conditiion would just catch it and skip the syscall. > > But if it's a positive number, the code cannot distinguish whether > that's a return value or a syscall number. > > So I introduced PIF_SYSCALL_RET_SET when converting s390 to generic > entry. This flag tells the syscall code that a return value was set in > ptregs and the syscall should be skipped. You also could have added a 'syscall_ret' member to pt_regs, operate on that for the return values (seccomp, syscall...) and swap it into gprs[2] right before returning to user space. > I'd like to see something like the change from Michal going in - cleaned > up of course. It would allow us to get rid of PIF_SYSCALL_RET_SET. I have no objections against cleaning it up and making it less convoluted.
