Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Martin Uecker Mon, 28 Jul 2025 14:40:09 -0700

Am Montag, dem 28.07.2025 um 20:48 +0000 schrieb Qing Zhao:
> 
> > On Jul 28, 2025, at 16:09, Martin Uecker <ma.uec...@gmail.com> wrote:
> > 
> > Am Montag, dem 28.07.2025 um 11:18 -0700 schrieb Yeoul Na:
> > > 
> > > 
> > > > On Jul 28, 2025, at 10:27 AM, Qing Zhao <qing.z...@oracle.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Jul 26, 2025, at 12:43, Yeoul Na <yeoul...@apple.com> wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > > On Jul 24, 2025, at 3:52 PM, Kees Cook <k...@kernel.org> wrote:
> > > > > > 
> > > > > > On Thu, Jul 24, 2025 at 04:26:12PM +0000, Aaron Ballman wrote:
> > > > > > > Ah, apologies, I wasn't clear. My thinking is: we're (Clang folks)
> > > > > > > going to want it to work in C++ mode because of shared headers. 
> > > > > > > If it
> > > > > > > works in C++ mode, then we have to figure out what it means with 
> > > > > > > all
> > > > > > > the various C++ features that are possible, not just the use cases
> > > > > > 
> > > > > > I am most familiar with C, so I may be missing something here, but 
> > > > > > if
> > > > > > -fbounds-safety is intended to be C only, then why not just make it
> > > > > > unrecognized in C++?
> > > > > 
> > > > > The bounds safety annotations must also be parsable in C++. While C++ 
> > > > > can get bounds checking by using std::span instead of raw pointers, 
> > > > > switching to std::span breaks ABI. Therefore,
> > > > > in many situations, C++ code must continue to use raw pointers—for 
> > > > > example, when interoperating with C code by sharing headers with C. 
> > > > > In such cases, bounds annotations can help close
> > > > > safety gaps in raw pointers.
> > > > 
> > > > -fbound-safety feature was initially proposed as an C extension, So, 
> > > > it’s natural to make it compatible with C language, not C++. 
> > > > If C++ also need such a feature, then an extension to C++ is needed too.
> > > > If a consistent syntax for this feature can satisfy both C and C++,  
> > > > that will be ideal.
> > > > However, if  providing such consistent syntax requires major changes to 
> > > > C language, 
> > > > ( a new name lookup scope, and late parsing), it might be a good idea 
> > > > to provide different syntax for C and C++. 
> > > 
> > > 
> > > So the main problem here is when the "same code” will be parsed in both 
> > > in C and C++, which is quite common in practice.
> > > 
> > > Therefore, we need a way to reasonably write code that works both C and 
> > > C++. 
> > > 
> > > From my perspective, that means:
> > > 
> > > 1. The same spelling doesn’t “silently" behave differently in C and C++.
> > > 2. At least the most common use cases (i.e., __counted_by(peer)) should 
> > > be able to be written the same way in C and C++, without ceremony.
> > > 
> > > Here is our compromise proposal that meets these requirements, until we 
> > > get blessing from the standard for a more elegant solution:
> > > 
> > > 1. `__counted_by(member)` keeps working as is: late parsing + name lookup 
> > > finds the member name first
> > > 2. `__counted_by_expr(expr)` uses a new syntax (e.g., __self), and is not 
> > > allowed to use a name that matches the member name without the new syntax 
> > > even if that would’ve resolved to a
> > > global variable. Use something like  `__global_ref(id)` to disambiguate. 
> > > This rule will prevent the confusion where `__counted_by_expr(id)` and 
> > > `__counted_by(id)` may designate different
> > > entities.
> > > 
> > > Here are the examples:
> > > 
> > > Ex 1)
> > > constexpr int n = 10;
> > > 
> > > struct s {
> > >   int *__counted_by(n) ptr; // resolves to member `n`; which matches the 
> > > current behavior 
> > >   int n;
> > > };
> > > 
> > > Ex 2)
> > > constexpr int n = 10;
> > > struct s {
> > >   int *__counted_by_expr(n) ptr; // error: referring to a member name 
> > > without “__self."
> > >   int n;
> > > };
> > > 
> > > Ex 3)
> > > constexpr int n = 10;
> > > struct s {
> > >   int *__counted_by_expr(__self.n) ptr; // resolves to member `n`
> > >   int n;
> > > };
> > > 
> > > 
> > > Ex 4)
> > > constexpr int n = 10;
> > > struct s {
> > >   int *__counted_by_expr(__self.n + 1) ptr; // resolves to member `n`
> > >   int n;
> > > };
> > > 
> > > 
> > > Ex 5)
> > > constexpr int n = 10;
> > > struct s {
> > >   int *__counted_by_expr(__global_ref(n) + 1) ptr; // resolves to global 
> > > `n`
> > >   int n;
> > > };
> > > 
> > > 
> > > Ex 6)
> > > constexpr int n = 10;
> > > struct s {
> > >   int *__counted_by_expr(n + 1) ptr; // resolves to global `n`; okay, no 
> > > matching member name
> > > };
> > > 
> > > Or in case, people prefer forward declaration inside 
> > > `__counted_by_expr()`, the similar rule can apply to achieve the same 
> > > goal.
> > > 
> > 
> > Thank you Yeoul! 
> > 
> > I think it is a reasonable compromise.
> 
> Yes, I agree. -:)
> 
> It adds two new keywords in both C and C++ (__self and __global_ref) to 
> explicitly mark the scopes for the variables inside the attribute. 
> will definitely resolve the lookup scope ambiguity issue in both C and C++. 
> 
> However, it will not resolve the issue when the counted_by field is declared 
> After the pointer field. 
> So, forward declarations is still  needed to resolve this issue, I think.


Yes, forwards declarations are this simplest solution.


Another idea I mentioned before is to let __self.N have type 
int, and then emit an error later if it has  a type that 
would change the type / meaning of the immediate
parent expression.

This would allow all of the following:

struct foo { 
        char * __counted_by_expr(__self.N) buf;
        int N;
};
struct foo {
        char * __counted_by_expr(__self.N + 1L) buf;
        long N;
};
struct foo {
        char * __counted_by_expr(__self.N * 2) buf;
        int N;
};
struct foo {
        char * __counted_by_expr(__self.N + 2) buf;
        char N;
};
struct foo {
        char * __counted_by_expr(__self.N + .M) buf;
        int N; int M;
};
struct foo {
        char * __counted_by_expr((int)__self.N) buf;
        double N;
};
struct foo {
        char * __counted_by_expr(3 * sizeof(__self.buf2)) buf;
        char buf2[5];
};
struct foo {
        char * __counted_by_expr(((struct bar *)__self.x)->z) buf; 
        struct bar *x;
};


It would *not* allow:

struct foo {
        char * __counted_by_expr(__self.N + 1) buf;
        long N;
};
struct foo {
        char * __counted_by_expr(__self.x->z) buf;
        struct foo *x;
};


But in this case you would get an explicit error:

xyz:13.4: Type of `__self.N' needs to be known.  Did you forget to
add a cast `(long)__self.N'?



Martin

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to