On Wed, Jul 23, 2025 at 7:40 PM Kees Cook <k...@kernel.org> wrote:
>
> On Wed, Jul 23, 2025 at 11:53:40AM +0000, Aaron Ballman wrote:
> > That said, John McCall pointed out some usage patterns Apple has with
> > their existing feature:
> >
> > * 655 simple references to variables or struct members: __counted_by(len)
> > * 73 dereferences of variables or struct members: __counted_by(*lenp)
> > * 80 integer literals: __counted_by(8)
> > * 60 macro references: __counted_by(NUM_EIGHT) [1]
> > * 9 simple sizeof expressions: __counted_by(sizeof(eight_bytes_t))
> > * 28 others my script couldn’t categorize:
> >   * 7 more complicated integer constant expressions:
> > __counted_by(num_bytes_for_bits(NUM_FIFTY_SEVEN)) [2]
> >   * 16 arithmetically-adjusted references to a single variable or
> > struct member: __counted_by(2 * len + 8)
> >   * 1 nested struct member: __counted_by(header.len)
> >   * 4 combinations of struct members: __counted_by(len + cnt) [3]
> >
> > Do the Linux kernel folks think this looks somewhat like what their
> > usage patterns will be as well?
>
> Yes, this matches my expectations for its usage, though there is one
> case I don't see explicitly mentioned above, which is referencing a
> global variable (but if a function can be used, then an accessor can be
> created for returning the global).

Good to know!

> > If so, I'd like to argue for my
> > personal stake in the ground: we don't need any new language features
> > to solve this problem, we can use the existing facilities to do so and
> > downscope the initial feature set until a better solution comes along
> > for forward references. Use two attributes: counted_by (whose argument
> > specifies an already in-scope identifier of what holds the count) and
> > counts (whose argument specifies an already in-scope identifier of
> > what it counts). e.g.,
> > ```
> > struct S {
> >   char *start_buffer;
> >   int start_len __counts(start_buffer);
> >   int end_len;
> >   char *end_buffer __counted_by(end_len);
> > };
> >
> > void func(char *buffer, int N __counts(buffer), int M, char *buffer
> > __counted_by(M));
> > ```
> > It's kind of gross to need two attributes to do the same notional
> > thing, but it does solve the vast majority of the usages seen in the
> > wild if you're willing to accept some awkwardness around things like:
> > ```
> > struct S {
> >   char *buffer;
> >   int *len __counts(buffer); // Note that len is a pointer
> > };
> > ```
> > because we'd need the semantics of `counts` to include dereferencing
> > to the `int` in order to be a valid count. We'd be sacrificing the
>
> The lone struct member delayed parsing is already implemented in Qing's
> series, so that isn't an issue. i.e. this is parsed fine:
>
> struct S {
>   char *start_buffer __counted_by(start_len);
>   int start_len;
>   int end_len;
>   char *end_buffer __counted_by(end_len);
> };
>
> Doing this for an _expression_ is, as I understand it, the sticking point.
>
> > ability to handle the "others my script couldn't categorize", but
> > that's 28 out of the 905 total cases and maybe that's acceptable?
>
> Three of those patterns are pretty important in Linux, though:
> - nested struct members
> - arithmetic adjustments (e.g. the count of an array includes the
>   rest of the struct size or is a byte count instead of element count)
> - making calls to helper functions
>
> For helper functions, the most common need is doing endian conversions
> (e.g. for protocol (de)serializing, where a length is stored in a
> different byte order than the native CPU byte order):
>
> struct S {
>   struct header hdr;
>   __be32 bytes;
>   struct info array[] __counted_by(be32_to_cpu(bytes) / sizeof(struct info));
> };

Ah, that's helpful; yeah, my idea doesn't help with that kind of
situation, we'd have to basically keep them out of scope until some
other solution comes along. Hopefully we can find a better solution
that meets those needs too.

~Aaron

>
> --
> Kees Cook

Reply via email to