On July 2, 2026 2:49:56 PM PDT, Thomas Gleixner <[email protected]> wrote:
>On Wed, Jul 01 2026 at 11:29, H. Peter Anvin wrote:
>
>Can you please trim your replies? Scrolling through hundred lines of
>useless quoted text is just annoying.
>
>> On July 1, 2026 10:42:08 AM PDT, "Michal Suchánek" <[email protected]> wrote:
>>>-static __always_inline long syscall_enter_from_user_mode(struct pt_regs 
>>>*regs, long syscall)
>>>+static __always_inline long syscall_enter_from_user_mode(struct pt_regs 
>>>*regs, long *syscall)
>>> {
>>>     long ret;
>>>
>
>> 1. The type for a system call is int.
>
>That ship has sailed long ago. man syscall ...
>
>> 2. A valid system call number is always going to be positive.
>
>That's true today.
>
>> 3. Bits [30:24] are available for architecture ABI use. The
>>    "architecture independent" part of the system call number is therefore
>>    24 bits wide.
>>
>> 4. The exact ABI is platform-specific, obviously, but as a general
>>    guideline (especially for new platforms/ABIs) should follow the rules
>>    for a platform "int" if practical. Notably, when passing a value in a
>>    register larger than 32 bits, which side of the calling interface is
>>    responsible for sign-extending a value passed in a register. If caller
>>    side, the kernel should validate, if callee side the kernel should
>>    ignore the additional bits and do the extension.
>
>The kernel sign expands today already, i.e. for compat syscalls.
>
>> 5. A negative system call number is guaranteed to return -ENOSYS
>>    (unless intercepted by seccomp, ptrace, or another mechanism under
>>    user space control.)
>
>That's true today.
>
>ASM entry:
>       regs->eax = -ENOSYS;
>
>C entry:
>       nr = syscall_enter_from_user_mode(regs, nr);
>
>       if ((unsigned)nr < SYSCALL_MAX)
>                   regs->eax = handle_syscall();
>       else if (nr != -1)
>                   regs->eax = -ENOSYS;
>
>       ....
>
>If seccomp overwrites regs->eax and aborts any syscall (including -1) by
>returning -1, then the value seccomp wrote into regs->eax is preserved
>and returned to user space.
>
>The same applies for syscall_user_dispatch() and ptrace...() if they
>decide to overwrite regs->eax _and_ abort the syscall by letting
>syscall_enter_from_user_mode() return -1.
>
>trace_syscall_enter() is not any different. If the magic BPF in there
>rewrites the syscall number to -1 then either the original -ENOSYS or
>the BPF induced overwrite is returned to user space.
>
>It's less than obvious and I have no objections to clean that up and
>make it more intuitive, but I still fail to see what Michal is actually
>trying to solve and what the magic flag is for. If s390 requires it,
>then that's an s390 problem, but definitely x86 does not.
>
>> 6. If the platform needs to algorithmically modify the system call
>>    number due to platform-specific concerns (say, the platform uses a
>>    16-bit special purpose register for the syscall number, or it has
>>    multiple kernel entry points with different behavior), it should if at
>>    all possible transcode the system call number as necessary to match
>>    this convention in APIs that are exposed to general kernel code.
>>
>> For example, in the future I could very much see the IA32 code in the
>> x86 kernel using bit 29 internally to indicate an ia32 system call,
>> simplifying the is_compat implementation on x86.
>
>I don't see how that makes it simpler. Those are two different entry
>code paths and magic bits wont make that go away.
>
>> It should not mean that passing bit 29 to either the syscall
>> instruction or int $0x80 will be accepted.
>
>Your proposal looks even more like a solution in search of a problem
>than the original one.
>
>Thanks,
>
>        tglx
>
>

The type in syscall(3) is irrelevant. The argument passed to the kernel is 
treated as an int and sign-extended from 32 bits. 

I'm explicitly not trying to invent things; I'm trying to document the status 
quo to avoid further confusion and to create mistakes. 

I'm sorry I muddled the waters with what was intended to be a hypothetical 
example.

Reply via email to