On Thu, 28 May 2026 16:01:06 -0700
Andrii Nakryiko <[email protected]> wrote:
>
> [...]
>
> > * Architecture-specific system calls
> > diff --git a/include/uapi/asm-generic/unistd.h
> > b/include/uapi/asm-generic/unistd.h
> > index a627acc8fb5f..17042d7e5e87 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -863,8 +863,13 @@ __SYSCALL(__NR_listns, sys_listns)
> > #define __NR_rseq_slice_yield 471
> > __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
> >
> > +#define __NR_sframe_register 472
> > +__SYSCALL(__NR_sframe_register, sys_sframe_register)
> > +#define __NR_sframe_unregister 473
> > +__SYSCALL(__NR_sframe_unregister, sys_sframe_unregister)
> > +
> > #undef __NR_syscalls
> > -#define __NR_syscalls 472
> > +#define __NR_syscalls 474
> >
> > /*
> > * 32 bit systems traditionally used different
> > diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> > new file mode 100644
> > index 000000000000..d3c9f88b024b
> > --- /dev/null
> > +++ b/include/uapi/linux/sframe.h
> > @@ -0,0 +1,12 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +#ifndef _UAPI_LINUX_SFRAME_H
> > +#define _UAPI_LINUX_SFRAME_H
> > +
> > +struct sframe_setup {
>
> I'd add `u64 flags;` field for easier and nicer extensibility. Check
> in the kernel that it is set to zero, future kernels will allow some
> of the bits to be set.
That sounds reasonable.
>
> And I still think that prctl() instead of a separate sframe-specific
> syscall is the way to go. I see no reason for sframe-specific set of
> syscalls just to set a bit of extra metadata for the entire process.
> That seems to be the job of prctl().
I personally do not have a preference. I've just heard a lot from
others where they want to avoid extending an ioctl() like system call
or even create a new multiplexer syscall.
If we can get a consensus of using prctl() or adding a separate system
call, I'll go with whatever that is.
>
> > + __u64 sframe_start;
> > + __u64 sframe_size;
> > + __u64 text_start;
> > + __u64 text_size;
> > +};
> > +
>
> [...]
>
> > +
> > +/**
> > + * sys_sframe_register - register an address for user space stacktrace
> > walking.
> > + * @data: Structure of sframe data used to register the sframe section
> > + * @size: The size of the given structure.
> > + *
> > + * This system call is used by dynamic library utilities to inform the
> > kernel
> > + * of meta data that it loaded that can be used by the kernel to know how
> > + * to stack walk the given text locations.
> > + *
> > + * Return: 0 if successful, otherwise a negative error.
> > + */
> > +SYSCALL_DEFINE2(sframe_register, struct sframe_setup __user *, data,
> > size_t, size)
> > +{
> > + struct sframe_setup sframe;
> > +
> > + if (sizeof(sframe) != size)
> > + return -EINVAL;
>
> This seems overly aggressive. It seems like the pattern is to allow
> sizes both smaller and bigger:
> - if user-provided size is smaller than what kernel knows about,
> treat missing fields as zeroes
Well, that could work with unregister, but for register that isn't
quite useful, as all fields should be filled (well, if we add flags,
that may not be 100% true).
> - if user-provided size is bigger, then check that space after
> fields that kernel recognizes are all zeroes.
That is dangerous. A zero with greater size could mean something. If
the size is greater than expected it should simply fail and let user
space call it again with the older version.
>
> This allows extensibility without having to change user space code all
> the time. Old code will provide smaller struct without new (presumably
> optional) fields, while newer code can use newer and larger struct
> size, but as long as it clears extra fields old kernel will be fine
> with that.
The old size will always work, thus old code will always continue to
work. If we extend the system call, then it must handle both the older
size as well as the newer size. User space would not need to change. It
would only change if it wanted to use a new feature, and if it wants to
work with older kernels it would need to try the bigger size first and
if that fails, it knows the kernel doesn't support that new feature and
then user space can figure out what to do. Either use the old system
call or abort.
-- Steve
>
> > +
> > + if (copy_from_user(&sframe, data, size))
> > + return -EFAULT;
> > +
> > + return sframe_add_section(sframe.sframe_start,
> > + sframe.sframe_start + sframe.sframe_size,
> > + sframe.text_start,
> > + sframe.text_start + sframe.text_size);
> > +}
> > +
>
> [...]