Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Bill Wendling Mon, 28 Jul 2025 13:05:43 -0700

On Sat, Jul 26, 2025 at 1:05 AM Martin Uecker <ma.uec...@gmail.com> wrote:
> Am Freitag, dem 25.07.2025 um 20:07 -0700 schrieb Bill Wendling:
[snip, because I'm part of the choir]
> >  I don't know what else to tell you except that
> > resolving to the struct first is the exact behavior we've been trying
> > to get correct between GCC and Clang. All arguments for making the
> > name resolution more explicit have been more-or-less shrugged off. And
> > even if we adopted the dot-notation (which I'm not against doing), we
> > would *still* need some form of delayed parsing.
>
> Only if we adopted the dot syntax while also allowing arbitrary
> C expressions.


??? No. I've suggested on the GCC mailing list that we could perform
the "acceptable lookup" multiple times for an expression and it was
rejected out of hand. If that's not what you mean, then please tell me
exactly how we're going to resolve the dot-notated symbols before
they're "seen" without delayed parsing. I've asked this many many
times and have never gotten an answer to it.

And I've explained exactly the types of C expressions that are
"allowable" within these attribute. It's essentially the acceptable
expressions...i.e., they're "simple" expressions, not complex and
certainly not arbitrary. Furthermore, we can check to ensure that the
expressions *are* simple.

> We can also define a sub language for bounds that can be parsed without
> knowing the types of the variables.

What?! No! We don't need to do anything like this at all.

> For example, we could say we allow only expressions of he form
>
> .N + offset
>
> where all constants and variables are always converted to size_t but
> with overflow being a run-time error.
>
This fails when you have something like this:

  .N + sizeof(.Z)

where Z is an 'unsigned char' or something other than 'size_t'.

I've explained already what the "acceptable" expressions are. Please
go read that. It's more than just "<identifier + constant>", which
doesn't cover a lot of the other cases we want to handle, like calling
a function to byteswap, or accessing a field in a sub-struct.

> > Yes and no. We started with a single identifier, but the idea of using
> > expressions more complex than '.N + 3' was always the goal. And, from
> > my understanding, Clang does support that. All of the work we've been
> > doing has been to support the expression stuff, and that's really my
> > main focus for this RFC; specifically expressions within the attribute
> > used in structs. Whether this RFC could be used for parameters has yet
> > to be seen (I suspect that it could, but would be more invasive).
>
> The original idea in WG14.  We have been discussing these things for
> quite a while.  But I agree what we need some expressions in structures.
>
> But the key question is: Do we need to invoke the full language parser?
>
> To me it seems that we would need to restrict this very much, because
> I do not think we want to allow evaluation of arbitrary expressions
> on each structure access anyhow.
>
> A small sub language for bounds annotation seems to be an entirely
> reasonable approach to me, and we have similar heavily constrained
> sublanguages in C already, e.g. for address constants (which might
> need to be passed down to the linker).
>
Invoking the "full language parser" is only a convenience, because it
already has the logic in it to handle the expressions we feed it. It
saves us from having to find a suitable point in the parser to start
from. And quite frankly it shouldn't add anything to the parse time,
because it would have been parsed that way at the attribute point. But
if you wish to invoke the parser at a separate entry point, sure, why
not. Find one that won't have 10^1000 edge cases flooding in.

-bw

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to