On Thu, May 28, 2026 at 7:20 PM Steven Rostedt <[email protected]> wrote:
>
> On Thu, 28 May 2026 16:01:06 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> >
> > [...]
> >
> > > * Architecture-specific system calls
> > > diff --git a/include/uapi/asm-generic/unistd.h
> > > b/include/uapi/asm-generic/unistd.h
> > > index a627acc8fb5f..17042d7e5e87 100644
> > > --- a/include/uapi/asm-generic/unistd.h
> > > +++ b/include/uapi/asm-generic/unistd.h
> > > @@ -863,8 +863,13 @@ __SYSCALL(__NR_listns, sys_listns)
> > > #define __NR_rseq_slice_yield 471
> > > __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
> > >
> > > +#define __NR_sframe_register 472
> > > +__SYSCALL(__NR_sframe_register, sys_sframe_register)
> > > +#define __NR_sframe_unregister 473
> > > +__SYSCALL(__NR_sframe_unregister, sys_sframe_unregister)
> > > +
> > > #undef __NR_syscalls
> > > -#define __NR_syscalls 472
> > > +#define __NR_syscalls 474
> > >
> > > /*
> > > * 32 bit systems traditionally used different
> > > diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> > > new file mode 100644
> > > index 000000000000..d3c9f88b024b
> > > --- /dev/null
> > > +++ b/include/uapi/linux/sframe.h
> > > @@ -0,0 +1,12 @@
> > > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > > +#ifndef _UAPI_LINUX_SFRAME_H
> > > +#define _UAPI_LINUX_SFRAME_H
> > > +
> > > +struct sframe_setup {
> >
> > I'd add `u64 flags;` field for easier and nicer extensibility. Check
> > in the kernel that it is set to zero, future kernels will allow some
> > of the bits to be set.
>
> That sounds reasonable.
>
> >
> > And I still think that prctl() instead of a separate sframe-specific
> > syscall is the way to go. I see no reason for sframe-specific set of
> > syscalls just to set a bit of extra metadata for the entire process.
> > That seems to be the job of prctl().
>
> I personally do not have a preference. I've just heard a lot from
> others where they want to avoid extending an ioctl() like system call
> or even create a new multiplexer syscall.
>
> If we can get a consensus of using prctl() or adding a separate system
> call, I'll go with whatever that is.
prctl() is an already existing multiplexing syscall used to provide
some per-process (of per-thread sometimes, it seems) hints and
options. Please consider sending prctl() extension, please CC me, and
let's see what arguments do people have against extending an already
existing syscall.
>
> >
> > > + __u64 sframe_start;
> > > + __u64 sframe_size;
> > > + __u64 text_start;
> > > + __u64 text_size;
> > > +};
> > > +
> >
> > [...]
> >
> > > +
> > > +/**
> > > + * sys_sframe_register - register an address for user space stacktrace
> > > walking.
> > > + * @data: Structure of sframe data used to register the sframe section
> > > + * @size: The size of the given structure.
> > > + *
> > > + * This system call is used by dynamic library utilities to inform the
> > > kernel
> > > + * of meta data that it loaded that can be used by the kernel to know how
> > > + * to stack walk the given text locations.
> > > + *
> > > + * Return: 0 if successful, otherwise a negative error.
> > > + */
> > > +SYSCALL_DEFINE2(sframe_register, struct sframe_setup __user *, data,
> > > size_t, size)
> > > +{
> > > + struct sframe_setup sframe;
> > > +
> > > + if (sizeof(sframe) != size)
> > > + return -EINVAL;
> >
> > This seems overly aggressive. It seems like the pattern is to allow
> > sizes both smaller and bigger:
> > - if user-provided size is smaller than what kernel knows about,
> > treat missing fields as zeroes
>
> Well, that could work with unregister, but for register that isn't
> quite useful, as all fields should be filled (well, if we add flags,
> that may not be 100% true).
>
This is a question of API design. If newly added fields are optional
by default, this works great. And even if you are adding some fields
that in the future will be mandatory (or it could be mandatory based
on flags), then it's super easy to error out if they are not set.
We've been doing this for years now in bpf() syscall and it works
pretty well overall, while also keeping user-space (libbpf, for
instance) side *much* simpler. I don't want to imagine bpf() syscall
which in each kernel version enforces a different size of bpf_attr
union...
> > - if user-provided size is bigger, then check that space after
> > fields that kernel recognizes are all zeroes.
>
> That is dangerous. A zero with greater size could mean something. If
> the size is greater than expected it should simply fail and let user
> space call it again with the older version.
>
Could, but it shouldn't if we extend API reasonably. And if it so
happens that zero will be meaningful, then you add a new flag that has
to be set if that field is present. This is a solved problem.
Requiring user space to use differently-sized structs for different
kernel versions is much-much worse.
> >
> > This allows extensibility without having to change user space code all
> > the time. Old code will provide smaller struct without new (presumably
> > optional) fields, while newer code can use newer and larger struct
> > size, but as long as it clears extra fields old kernel will be fine
> > with that.
>
> The old size will always work, thus old code will always continue to
> work. If we extend the system call, then it must handle both the older
> size as well as the newer size. User space would not need to change. It
> would only change if it wanted to use a new feature, and if it wants to
> work with older kernels it would need to try the bigger size first and
> if that fails, it knows the kernel doesn't support that new feature and
> then user space can figure out what to do. Either use the old system
> call or abort.
See above, many added features are typically optional (e.g., imagine
some extra bits of information that goes along with currently existing
mandatory sframe data). And it's easy to code user space code that can
automatically and gracefully "downgrade" by detecting that kernel
doesn't support some feature and thus just not setting the field,
leaving it zero. But you won't have to track what should be the right
size of the struct which in your API headers is already larger because
you compiled something on newer kernel headers.
Believe me, this is the right way to go with this kind of extendable binary API.
>
> -- Steve
>
> >
> > > +
> > > + if (copy_from_user(&sframe, data, size))
> > > + return -EFAULT;
> > > +
> > > + return sframe_add_section(sframe.sframe_start,
> > > + sframe.sframe_start +
> > > sframe.sframe_size,
> > > + sframe.text_start,
> > > + sframe.text_start + sframe.text_size);
> > > +}
> > > +
> >
> > [...]
>