Hi Tiwei, On Tue, 2024-05-28 at 18:16 +0800, Tiwei Bie wrote: > On 5/28/24 4:54 PM, benja...@sipsolutions.net wrote: > > From: Benjamin Berg <benjamin.b...@intel.com> > > > > Newer glibc versions are enabling rseq support by default. This remains > > enabled in the cloned child process, potentially causing the host kernel > > to write/read memory in the child. > > > > It appears that this was purely not an issue because the used memory > > area happened to be above TASK_SIZE and remains mapped. > > I also encountered this issue. In my case, with "Force a static link" > (CONFIG_STATIC_LINK) enabled, UML will crash immediately every time > it starts up. I worked around this by setting the glibc.pthread.rseq > tunable via GLIBC_TUNABLES [1] before launching UML. > > So another easy way to work around this issue without introducing runtime > overhead might be to add the GLIBC_TUNABLES=glibc.pthread.rseq=0 environment > variable and exec /proc/self/exe in UML on startup.
I am not really worried about the overhead, but I agree that setting GLIBC_TUNABLES is also a reasonable solution to the problem. Doing the memfd/execveat dance with an embedded static binary would still be best in my view, but either this or GLIBC_TUNABLES seem fine in the meantime. Do you want to submit the patch? Should I re-roll the patchset with GLIBC_TUNABLES? Benjamin > [1] https://www.gnu.org/software/libc/manual/html_node/Tunables.html > > Regards, > Tiwei > > > > > Note that a better approach would be to exec a small static binary that > > does not link with other libraries. Using a memfd and execveat the > > binary could be embedded into UML itself and it would result in an > > entirely clean execution environment for userspace. > > > > Signed-off-by: Benjamin Berg <benjamin.b...@intel.com> > > --- > > arch/um/os-Linux/skas/process.c | 54 ++++++++++++++++++++++++++++++--- > > 1 file changed, 50 insertions(+), 4 deletions(-) > > > > diff --git a/arch/um/os-Linux/skas/process.c > > b/arch/um/os-Linux/skas/process.c > > index 41a288dcfc34..ee332a2aeea6 100644 > > --- a/arch/um/os-Linux/skas/process.c > > +++ b/arch/um/os-Linux/skas/process.c > > @@ -255,6 +255,31 @@ static int userspace_tramp(void *stack) > > int userspace_pid[NR_CPUS]; > > int kill_userspace_mm[NR_CPUS]; > > > > +struct tramp_data { > > + int pid; > > + void *clone_sp; > > + void *stack; > > +}; > > + > > +static int userspace_tramp_clone_vm(void *data) > > +{ > > + struct tramp_data *tramp_data = data; > > + > > + /* > > + * This helper exist to do a double-clone. First with CLONE_VM which > > + * effectively disables things like rseq, and then the second one to > > + * get a new memory space. > > + */ > > + > > + tramp_data->pid = clone(userspace_tramp, tramp_data->clone_sp, > > + CLONE_PARENT | CLONE_FILES | SIGCHLD, > > + tramp_data->stack); > > + if (tramp_data->pid < 0) > > + tramp_data->pid = -errno; > > + > > + exit(0); > > +} > > + > > /** > > * start_userspace() - prepare a new userspace process > > * @stub_stack: pointer to the stub stack. > > @@ -268,9 +293,10 @@ int kill_userspace_mm[NR_CPUS]; > > */ > > int start_userspace(unsigned long stub_stack) > > { > > + struct tramp_data tramp_data; > > void *stack; > > unsigned long sp; > > - int pid, status, n, flags, err; > > + int pid, status, n, err; > > > > /* setup a temporary stack page */ > > stack = mmap(NULL, UM_KERN_PAGE_SIZE, > > @@ -286,10 +312,13 @@ int start_userspace(unsigned long stub_stack) > > /* set stack pointer to the end of the stack page, so it can grow > > downwards */ > > sp = (unsigned long)stack + UM_KERN_PAGE_SIZE; > > > > - flags = CLONE_FILES | SIGCHLD; > > + tramp_data.stack = (void *) stub_stack; > > + tramp_data.clone_sp = (void *) sp; > > + tramp_data.pid = -EINVAL; > > > > /* clone into new userspace process */ > > - pid = clone(userspace_tramp, (void *) sp, flags, (void *) stub_stack); > > + pid = clone(userspace_tramp_clone_vm, (void *) sp, > > + CLONE_VM | CLONE_FILES | SIGCHLD, &tramp_data); > > if (pid < 0) { > > err = -errno; > > printk(UM_KERN_ERR "%s : clone failed, errno = %d\n", > > @@ -305,7 +334,24 @@ int start_userspace(unsigned long stub_stack) > > __func__, errno); > > goto out_kill; > > } > > - } while (WIFSTOPPED(status) && (WSTOPSIG(status) == SIGALRM)); > > + } while (!WIFEXITED(status)); > > + > > + pid = tramp_data.pid; > > + if (pid < 0) { > > + printk(UM_KERN_ERR "%s : second clone failed, errno = %d\n", > > + __func__, -pid); > > + return pid; > > + } > > + > > + do { > > + CATCH_EINTR(n = waitpid(pid, &status, WUNTRACED | __WALL)); > > + if (n < 0) { > > + err = -errno; > > + printk(UM_KERN_ERR "%s : wait failed, errno = %d\n", > > + __func__, errno); > > + goto out_kill; > > + } > > + } while (WIFEXITED(status) && (WSTOPSIG(status) == SIGALRM)); > > > > if (!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP)) { > > err = -EINVAL; > >