On Thu, 19 Jun 2025 19:31:53 +0900, Benjamin Berg wrote:
> > diff --git a/arch/x86/um/nommu/do_syscall_64.c > > b/arch/x86/um/nommu/do_syscall_64.c > > new file mode 100644 > > index 000000000000..5d0fa83e7fdc > > --- /dev/null > > +++ b/arch/x86/um/nommu/do_syscall_64.c > > @@ -0,0 +1,37 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +#include <linux/kernel.h> > > +#include <linux/ptrace.h> > > +#include <kern_util.h> > > +#include <sysdep/syscalls.h> > > +#include <os.h> > > + > > +__visible void do_syscall_64(struct pt_regs *regs) > > +{ > > + int syscall; > > + > > + syscall = PT_SYSCALL_NR(regs->regs.gp); > > + UPT_SYSCALL_NR(®s->regs) = syscall; > > + > > + pr_debug("syscall(%d) (current=%lx) (fn=%lx)\n", > > + syscall, (unsigned long)current, > > + (unsigned long)sys_call_table[syscall]); > > You probably want to drop the pr_debug from the syscall path. okay, I'll update those parts. > > + if (likely(syscall < NR_syscalls)) { > > + PT_REGS_SET_SYSCALL_RETURN(regs, > > + EXECUTE_SYSCALL(syscall, regs)); > > + } > > + > > + pr_debug("syscall(%d) --> %lx\n", syscall, > > + regs->regs.gp[HOST_AX]); > > + > > + PT_REGS_SYSCALL_RET(regs) = regs->regs.gp[HOST_AX]; > > + > > + /* execve succeeded */ > > + if (syscall == __NR_execve && regs->regs.gp[HOST_AX] == 0) > > + userspace(¤t->thread.regs.regs); > > That said, this is what I am stumbling over. Why do you need to jump > into userspace() here? It seems odd to me to need a special case in the > syscall path itself. > Aren't there other possibilities to hook/override the kernel task > state? thanks, I found that this is a leftover of our early implementation which doesn't have a proper schedule upon an exit from syscall. we can remove this part and I'll fix them after more investigation. > > + /* force do_signal() --> is_syscall() */ > > + set_thread_flag(TIF_SIGPENDING); > > + interrupt_end(); > > Same here. The MMU UML code seems to also do this, but restricted to > ptrace'd processes? Maybe I am just missing something obvious … nommu doesn't have separate process/context to indicate a schedule to the context here (syscall). without that part we do not have a chance to schedule tasks and signals to userspace. But the force on SIGPENDING flag is not actually needed so, I'll remove that part. thanks for pointing out. -- Hajime