On Wed, Jul 23, 2025 at 7:40 PM Kees Cook <k...@kernel.org> wrote: > > On Wed, Jul 23, 2025 at 11:53:40AM +0000, Aaron Ballman wrote: > > That said, John McCall pointed out some usage patterns Apple has with > > their existing feature: > > > > * 655 simple references to variables or struct members: __counted_by(len) > > * 73 dereferences of variables or struct members: __counted_by(*lenp) > > * 80 integer literals: __counted_by(8) > > * 60 macro references: __counted_by(NUM_EIGHT) [1] > > * 9 simple sizeof expressions: __counted_by(sizeof(eight_bytes_t)) > > * 28 others my script couldn’t categorize: > > * 7 more complicated integer constant expressions: > > __counted_by(num_bytes_for_bits(NUM_FIFTY_SEVEN)) [2] > > * 16 arithmetically-adjusted references to a single variable or > > struct member: __counted_by(2 * len + 8) > > * 1 nested struct member: __counted_by(header.len) > > * 4 combinations of struct members: __counted_by(len + cnt) [3] > > > > Do the Linux kernel folks think this looks somewhat like what their > > usage patterns will be as well? > > Yes, this matches my expectations for its usage, though there is one > case I don't see explicitly mentioned above, which is referencing a > global variable (but if a function can be used, then an accessor can be > created for returning the global).
Good to know! > > If so, I'd like to argue for my > > personal stake in the ground: we don't need any new language features > > to solve this problem, we can use the existing facilities to do so and > > downscope the initial feature set until a better solution comes along > > for forward references. Use two attributes: counted_by (whose argument > > specifies an already in-scope identifier of what holds the count) and > > counts (whose argument specifies an already in-scope identifier of > > what it counts). e.g., > > ``` > > struct S { > > char *start_buffer; > > int start_len __counts(start_buffer); > > int end_len; > > char *end_buffer __counted_by(end_len); > > }; > > > > void func(char *buffer, int N __counts(buffer), int M, char *buffer > > __counted_by(M)); > > ``` > > It's kind of gross to need two attributes to do the same notional > > thing, but it does solve the vast majority of the usages seen in the > > wild if you're willing to accept some awkwardness around things like: > > ``` > > struct S { > > char *buffer; > > int *len __counts(buffer); // Note that len is a pointer > > }; > > ``` > > because we'd need the semantics of `counts` to include dereferencing > > to the `int` in order to be a valid count. We'd be sacrificing the > > The lone struct member delayed parsing is already implemented in Qing's > series, so that isn't an issue. i.e. this is parsed fine: > > struct S { > char *start_buffer __counted_by(start_len); > int start_len; > int end_len; > char *end_buffer __counted_by(end_len); > }; > > Doing this for an _expression_ is, as I understand it, the sticking point. > > > ability to handle the "others my script couldn't categorize", but > > that's 28 out of the 905 total cases and maybe that's acceptable? > > Three of those patterns are pretty important in Linux, though: > - nested struct members > - arithmetic adjustments (e.g. the count of an array includes the > rest of the struct size or is a byte count instead of element count) > - making calls to helper functions > > For helper functions, the most common need is doing endian conversions > (e.g. for protocol (de)serializing, where a length is stored in a > different byte order than the native CPU byte order): > > struct S { > struct header hdr; > __be32 bytes; > struct info array[] __counted_by(be32_to_cpu(bytes) / sizeof(struct info)); > }; Ah, that's helpful; yeah, my idea doesn't help with that kind of situation, we'd have to basically keep them out of scope until some other solution comes along. Hopefully we can find a better solution that meets those needs too. ~Aaron > > -- > Kees Cook