Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Martin Uecker Mon, 28 Jul 2025 14:47:09 -0700

Am Montag, dem 28.07.2025 um 23:39 +0200 schrieb Martin Uecker:
> Am Montag, dem 28.07.2025 um 20:48 +0000 schrieb Qing Zhao:
> > 
> > > On Jul 28, 2025, at 16:09, Martin Uecker <ma.uec...@gmail.com> wrote:
> > > 
> > > Am Montag, dem 28.07.2025 um 11:18 -0700 schrieb Yeoul Na:
> > > > 
> > > > 
> > > > > On Jul 28, 2025, at 10:27 AM, Qing Zhao <qing.z...@oracle.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Jul 26, 2025, at 12:43, Yeoul Na <yeoul...@apple.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Jul 24, 2025, at 3:52 PM, Kees Cook <k...@kernel.org> wrote:
> > > > > > > 
> > > > > > > On Thu, Jul 24, 2025 at 04:26:12PM +0000, Aaron Ballman wrote:
> > > > > > > > Ah, apologies, I wasn't clear. My thinking is: we're (Clang 
> > > > > > > > folks)
> > > > > > > > going to want it to work in C++ mode because of shared headers. 
> > > > > > > > If it
> > > > > > > > works in C++ mode, then we have to figure out what it means 
> > > > > > > > with all
> > > > > > > > the various C++ features that are possible, not just the use 
> > > > > > > > cases
> > > > > > > 
> > > > > > > I am most familiar with C, so I may be missing something here, 
> > > > > > > but if
> > > > > > > -fbounds-safety is intended to be C only, then why not just make 
> > > > > > > it
> > > > > > > unrecognized in C++?
> > > > > > 
> > > > > > The bounds safety annotations must also be parsable in C++. While 
> > > > > > C++ can get bounds checking by using std::span instead of raw 
> > > > > > pointers, switching to std::span breaks ABI. Therefore,
> > > > > > in many situations, C++ code must continue to use raw pointers—for 
> > > > > > example, when interoperating with C code by sharing headers with C. 
> > > > > > In such cases, bounds annotations can help close
> > > > > > safety gaps in raw pointers.
> > > > > 
> > > > > -fbound-safety feature was initially proposed as an C extension, So, 
> > > > > it’s natural to make it compatible with C language, not C++. 
> > > > > If C++ also need such a feature, then an extension to C++ is needed 
> > > > > too.
> > > > > If a consistent syntax for this feature can satisfy both C and C++,  
> > > > > that will be ideal.
> > > > > However, if  providing such consistent syntax requires major changes 
> > > > > to C language, 
> > > > > ( a new name lookup scope, and late parsing), it might be a good idea 
> > > > > to provide different syntax for C and C++. 
> > > > 
> > > > 
> > > > So the main problem here is when the "same code” will be parsed in both 
> > > > in C and C++, which is quite common in practice.
> > > > 
> > > > Therefore, we need a way to reasonably write code that works both C and 
> > > > C++. 
> > > > 
> > > > From my perspective, that means:
> > > > 
> > > > 1. The same spelling doesn’t “silently" behave differently in C and C++.
> > > > 2. At least the most common use cases (i.e., __counted_by(peer)) should 
> > > > be able to be written the same way in C and C++, without ceremony.
> > > > 
> > > > Here is our compromise proposal that meets these requirements, until we 
> > > > get blessing from the standard for a more elegant solution:
> > > > 
> > > > 1. `__counted_by(member)` keeps working as is: late parsing + name 
> > > > lookup finds the member name first
> > > > 2. `__counted_by_expr(expr)` uses a new syntax (e.g., __self), and is 
> > > > not allowed to use a name that matches the member name without the new 
> > > > syntax even if that would’ve resolved to a
> > > > global variable. Use something like  `__global_ref(id)` to 
> > > > disambiguate. This rule will prevent the confusion where 
> > > > `__counted_by_expr(id)` and `__counted_by(id)` may designate different
> > > > entities.
> > > > 
> > > > Here are the examples:
> > > > 
> > > > Ex 1)
> > > > constexpr int n = 10;
> > > > 
> > > > struct s {
> > > >   int *__counted_by(n) ptr; // resolves to member `n`; which matches 
> > > > the current behavior 
> > > >   int n;
> > > > };
> > > > 
> > > > Ex 2)
> > > > constexpr int n = 10;
> > > > struct s {
> > > >   int *__counted_by_expr(n) ptr; // error: referring to a member name 
> > > > without “__self."
> > > >   int n;
> > > > };
> > > > 
> > > > Ex 3)
> > > > constexpr int n = 10;
> > > > struct s {
> > > >   int *__counted_by_expr(__self.n) ptr; // resolves to member `n`
> > > >   int n;
> > > > };
> > > > 
> > > > 
> > > > Ex 4)
> > > > constexpr int n = 10;
> > > > struct s {
> > > >   int *__counted_by_expr(__self.n + 1) ptr; // resolves to member `n`
> > > >   int n;
> > > > };
> > > > 
> > > > 
> > > > Ex 5)
> > > > constexpr int n = 10;
> > > > struct s {
> > > >   int *__counted_by_expr(__global_ref(n) + 1) ptr; // resolves to 
> > > > global `n`
> > > >   int n;
> > > > };
> > > > 
> > > > 
> > > > Ex 6)
> > > > constexpr int n = 10;
> > > > struct s {
> > > >   int *__counted_by_expr(n + 1) ptr; // resolves to global `n`; okay, 
> > > > no matching member name
> > > > };
> > > > 
> > > > Or in case, people prefer forward declaration inside 
> > > > `__counted_by_expr()`, the similar rule can apply to achieve the same 
> > > > goal.
> > > > 
> > > 
> > > Thank you Yeoul! 
> > > 
> > > I think it is a reasonable compromise.
> > 
> > Yes, I agree. -:)
> > 
> > It adds two new keywords in both C and C++ (__self and __global_ref) to 
> > explicitly mark the scopes for the variables inside the attribute. 
> > will definitely resolve the lookup scope ambiguity issue in both C and C++. 
> > 
> > However, it will not resolve the issue when the counted_by field is 
> > declared After the pointer field. 
> > So, forward declarations is still  needed to resolve this issue, I think.
> 
> Yes, forwards declarations are this simplest solution.
> 
> 
> Another idea I mentioned before is to let __self.N have type 
> int, and then emit an error later if it has  a type that 
> would change the type / meaning of the immediate
> parent expression.
> 
> This would allow all of the following:
> 
> struct foo { 
>       char * __counted_by_expr(__self.N) buf;
>       int N;
> };
> struct foo {
>       char * __counted_by_expr(__self.N + 1L) buf;
>       long N;
> };
> struct foo {
>       char * __counted_by_expr(__self.N * 2) buf;
>       int N;
> };
> struct foo {
>       char * __counted_by_expr(__self.N + 2) buf;
>       char N;
> };
> struct foo {
>       char * __counted_by_expr(__self.N + .M) buf;
>       int N; int M;
> };
> struct foo {
>       char * __counted_by_expr((int)__self.N) buf;
>       double N;
> };
> struct foo {
>       char * __counted_by_expr(3 * sizeof(__self.buf2)) buf;
>       char buf2[5];
> };
> struct foo {
>       char * __counted_by_expr(((struct bar *)__self.x)->z) buf; 
>       struct bar *x;
> };


Also basic function calls would work, where one could add
a special rule that it is assumed to have the type of
the argument.


size_t bar(struct foo x);
struct foo {
        char * __counted_by_expr(bar(__self.x))) buf;
        struct foo x;
};


Martin

> 
> 
> It would *not* allow:
> 
> struct foo {
>       char * __counted_by_expr(__self.N + 1) buf;
>       long N;
> };
> struct foo {
>       char * __counted_by_expr(__self.x->z) buf;
>       struct foo *x;
> };
> 
> 
> But in this case you would get an explicit error:
> 
> xyz:13.4: Type of `__self.N' needs to be known.  Did you forget to
> add a cast `(long)__self.N'?
>

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to