On 6/2/26 12:09, Alban Crequy wrote: > From: Alban Crequy <[email protected]> > > There are two categories of users for process_vm_readv: > > 1. Debuggers like GDB or strace. > > When a debugger attempts to read the target memory and triggers a > page fault, the page fault needs to be resolved so that the debugger > can accurately interpret the memory. A debugger is typically attached > to a single process. > > 2. Profilers like OpenTelemetry eBPF Profiler. > > The profiler uses a perf event to get stack traces from all > processes at 20Hz (20 stack traces to resolve per second). For > interpreted languages (Ruby, Python, etc.), the profiler uses > process_vm_readv to get the correct symbols. In this case, > performance is the most important. It is fine if some stack traces > cannot be resolved as long as it is not statistically significant. > > The current behaviour of process_vm_readv is to resolve page faults in > the target VM. This is as desired for debuggers, but unwelcome for > profilers because the page fault resolution could take a lot of time > depending on the backing filesystem. Additionally, since profilers > monitor all processes, we don't want a slow page fault resolution for > one target process slowing down the monitoring for all other target > processes. > > This patch adds the flag PROCESS_VM_NOWAIT, so the caller can choose to > not block on IO if the memory access causes a page fault. When a page > is not resident and would require IO to fault in, the syscall returns > a short read (the number of bytes successfully read before the fault) > or -1 with errno set to EFAULT if no bytes were read. > > Additionally, this patch adds the flag PROCESS_VM_PIDFD to refer to the > remote process via PID file descriptor instead of PID. Such a file > descriptor can be obtained with pidfd_open(2). This is useful to avoid > the pid number being reused. It is unlikely to happen for debuggers > because they can monitor the target process termination in other ways > (ptrace), but can be helpful in some profiling scenarios. When using > PROCESS_VM_PIDFD, the first argument is a pidfd instead of a pid. If > the pidfd is invalid, the syscall returns -1 with errno set to EBADF. > > If a given flag is unsupported, the syscall returns the error EINVAL > without checking the buffers. This gives a way to userspace to detect > whether the current kernel supports a specific flag: > > process_vm_readv(pid, NULL, 1, NULL, 1, PROCESS_VM_PIDFD) > -> EINVAL if the kernel does not support the flag PROCESS_VM_PIDFD > (before this patch) > -> EFAULT if the kernel supports the flag (after this patch) > > Suggested man page update for process_vm_readv(2): > > The flags argument is the bitwise OR of zero or more of these flags: > > PROCESS_VM_PIDFD (since Linux 7.x) > The pid argument is a PID file descriptor (see pidfd_open(2)) > instead of a PID number. When using this flag, the existing > ESRCH error applies if the process referred to by the pidfd > has exited. > > PROCESS_VM_NOWAIT (since Linux 7.x) > Do not block on IO. If a page in the remote address space is not > resident and would require disk IO to fault in, the system call > returns a short read or fails with EFAULT if no bytes were read. > > Additional error: > > EBADF pid is not a valid file descriptor (PROCESS_VM_PIDFD only). > > Signed-off-by: Alban Crequy <[email protected]> > ---
Nothing jumped at me, thanks! Acked-by: David Hildenbrand (Arm) <[email protected]> -- Cheers, David

