Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Qing Zhao Tue, 01 Apr 2025 11:59:07 -0700


> On Apr 1, 2025, at 11:28, Martin Uecker <[email protected]> wrote:
> 
> Am Dienstag, dem 01.04.2025 um 15:01 +0000 schrieb Qing Zhao:
>> 
>>> On Apr 1, 2025, at 10:04, Martin Uecker <[email protected]> wrote:
>>> 
>>> 
>>> 
>>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
>>>>> I'd like to offer up this to solve the issues we're facing. This is a
>>>>> combination of everything that's been discussed here (or at least that
>>>>> I've been able to read in the centi-thread :-).
>>> 
>>> Thanks! I think this proposal much better as it avoids undue burden
>>> on parsers, but it does not address all my concerns.
>>> 
>>> 
>>> From my side, the issue about compromising the scoping rules of C
>>> is also about unintended non-local effects of code changes. In
>>> my opinion, a change in a library header elsewhere should not cause 
>>> code in a local scope (which itself might also come from a macro) to
>>> emit a warning or require a programmer to add a workaround. So I am
>>> not convinced that adding warnings or a workaround such as
>>> __builtin_global_ref  is a good solution.
>>> 
>>> 
>>> I could see the following as a possible way forward: We only 
>>> allow the following two syntaxes:
>>> 
>>> 1. Single argument referring to a member.
>>> 
>>> __counted_by(len)
>>> 
>>> with an argument that must be a single identifier and where
>>> the identifier then must refer to a struct member. 
>>> 
>>> (I still think this is not ideal and potentially
>>> confusing, but in contrast to new scoping rules it is
>>> at least relatively easily to explain as a special rule.).  
>>>


So, in allowed syntax 1, the identifier inside counted_by attribute will be 
looked up inside 
the structure. 

This is our current implementation of the counted_by for FAM and my previous 
submitted
patch for counted_by for Pointers inside structures. 

Keeping this syntax is good. 

>>> 
>>> 2. Forward declarations. 
>>> 
>>> __counted_by(size_t len; len + PADDING)
>> 
>> In the above, the PADDING is some constant? 
> 
> In principle - when considering only the name lookup rules -
> it could be a constant, a global variable, or an automatic
> variable, i.e. any ordinary identifiers which is visible at
> this point. 

I am a little confused here:
Is this syntax 2 a new syntax, and with new name lookup rules other than the 
syntax 1?

How should the identifiers inside counted_by attribute with this syntax be 
looked up?
Inside the structure first? Then if not found, looking up the outer scope for 
identifiers in the
PADDING part?

Then, has a new scoping been introduced now?
Or some other special looking up rules for counted_by attribute?

> 
>> 
>> More complicated expressions involving globals will not be supported?
> 
> I think one could allow such expressions, But I think the
> expressions should be restricted to expressions which have
> no side effects.

See my question in above,  does this new syntax 2 introduce a new “structure 
scope” to enable
the identifiers to be looked up inside the structure first as syntax 1?  Or, 
this new syntax has the 
same lookup rule as the current C, will NOT look up inside the structure first?

> 
>> 
>>> where then the second part can also be a more complicated 
>>> expression, but with the explicit requirement that all
>>> identifiers in this expression are then looked up according to
>>> regular C language rules. So except for the forward declared
>>> member(s) they are *never* looked up in the member namespace of
>>> the struct, i.e. no new name lookup rules are introduced.
>> 
>> One question here:
>> 
>> What’s the major issue if we’d like to add one new scoping rule, for example,
>> “Structure scope” (the same as the C++’s instance scope) to C? 
>> 
>> (In addition to the "VLA in structure" issue I mentioned in my previous 
>> writeup, 
>> is there any other issue to prevent this new scoping rule being added into C 
>> ?).
> 
> Note that the "VLA in structure" is a bit of a red herring.  The exact same
> issues apply to lookup of any other ordinary identifiers in this context.
> 
> enum { MY_BUF_SIZE = 100 };
> struct foo {
>  char buf[MY_BUF_SIZE];
> };
> 
Yes, this is because there is NO “structure scope” available in C. As long as 
the “structure scope”
is added into C, identifiers could be looked up inside the “structure scope” 
first before looking up
outer scopes. 

> 
> C++ has instance scope for member functions. The rules for C++ are also
> complex and not very consistent (see the examples I posted earlier,
> demonstrating UB and compiler divergence). 

Yes, I studied those C++ examples when I wrote the proposal. And my observation
was: in C++, the instance scope always has higher priority than local and 
global scopes. 
i.e, when there is a conflict between instance scope and local/global scope for 
the identifier,
The identifier within the instance scope will shadow the one with the same name 
in the 
outer scope. 

But in C, there is No concept of “structure scope” at all. Identifiers will NOT 
looked up 
inside a structure at all.

> For C such a concept would
> be new and much less useful, so the trade-off seems unfavorable (in
> constrast to C++ where it is needed).

This concept is needed when referring a member variable inside the structure is 
needed, 
Such as the counted_by attribute, or later when we extend C language to include 
the bound info
Into the TYPE. 

But I agree with you that introducing a new instance scope into C might be too 
risky. 


>  I also see others issues:  Fully
> supporting instance scope would require changes to how C is parsed, 
> placing a burden on all C compilers and tooling. Depending on how you 
> specify it, it would also cause a change in semantics
> for existing code, something C tries very hard to avoid.

Yes, agreed. 
Introducing a new instance scope in C might be too risky, therefore not worth to
do it. 


> If you add
> warnings as mitigation,  it has the problem that it causes non-local
> effects where introducing a name in in enclosing scope somewhere else
> now necessitates a change to unrelated code, exactly what scoping rules
> are meant to prevent.  

Yes, that’s right. 
> 
> In any case, it seems a major change with many ramifications, including
> possibly unintended ones. This should certainly not be done without
> having a clear specification and support from WG14 (and probably not
> done at all.)

Yes, I agree. 

Qing
> 
> Martin
> 
>> 
>> Qing
>> 
>> 
>>> 
>>> 
>>> I think this could address my concerns about breaking
>>> scoping in C. Still, I personally would prefer designator syntax
>>> for both C and C++ as a nicer solution, and one that already
>>> has some support from WG14.
>>> 
>>> Martin
>>> 
>>> 
>>>>> 
>>>>> ---
>>>>> 
>>>>> 1. The use of '__self' isn't feasible, so we won't use it. Instead,
>>>>> we'll rely upon the current behavior—resolving any identifiers to the
>>>>> "instance scope". This new scope is used __only__ in attributes, and
>>>>> resolves identifiers to those in the least enclosing, non-anonymous
>>>>> struct. For example:
>>>>> 
>>>>> struct foo {
>>>>> char count;
>>>>> struct bar {
>>>>>   struct {
>>>>>     int len;
>>>>>   };
>>>>>   struct {
>>>>>     struct {
>>>>>       int *valid_use __counted_by(len); // Valid.
>>>>>     };
>>>>>   };
>>>>>   int *invalid_use __counted_by(count); // Invalid.
>>>>> } b;
>>>>> };
>>>>> 
>>>>> Rationale: This is how '__guarded_by' currently resolves identifiers,
>>>>> so there's precedence. And if we can't force its usage in all
>>>>> situations, it's less a feature and more a "nicety" which will lead to
>>>>> a massive discrepancy between compiler implementations. Despite the
>>>>> fact that this introduces a new scoping mechanism to C, its use is not
>>>>> as extensive as C++'s instance scoping and will apply only to
>>>>> attributes. In the case where we have two different resolution
>>>>> techniquest happening within the same structure (e.g. VLAs), we can
>>>>> issue warnings as outlined in Yeoul's RFC[1].
>>>>> 
>>>>> 2. A method of forward declaring variables will be added for variables
>>>>> that occur in the struct after the attribute. For example:
>>>>> 
>>>>> A: Necessary usage:
>>>>> 
>>>>> struct foo {
>>>>> int *buf __counted_by(char count; count);
>>>>> char count;
>>>>> };
>>>>> 
>>>>> B: Unnecessary, but still valid, usage:
>>>>> 
>>>>> struct foo {
>>>>> char count;
>>>>> int *buf __counted_by(char count; count);
>>>>> };
>>>>> 
>>>>> * The forward declaration is required in (A) but not in (B).
>>>>> * The type of 'count' as declared in '__counted_by' *must* match the real 
>>>>> type.
>>>>> 
>>>>> Rationale: This alleviates the issues of "double parsing" for
>>>>> compilers that aren't able to handle it. (We can also remove the
>>>>> '-fexperimental-late-parse-attributes' flag in Clang.)
>>>>> 
>>>>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is
>>>>> added to refer to variables outside of the most-enclosing structure.
>>>>> Example:
>>>>> 
>>>>> int count_that_will_never_change_we_promise;
>>>>> 
>>>>> struct foo {
>>>>> int *bar 
>>>>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise));
>>>>> unsigned flags;
>>>>> };
>>>>> 
>>>>> As Yeoul pointed out, there isn't a way to refer to variables that
>>>>> have been shadowed, so the 'global' in '__builtin_global_ref' is a bit
>>>>> of a misnomer as it could refer to a local variable.
>>>>> 
>>>>> Rationale: For those who need the flexibility to use variables outside
>>>>> of the struct, this is an acceptable escape route. It does make bounds
>>>>> checking less strict, though, as we can't track any modifications to
>>>>> the global, so caution must be used.
>>>>> 
>>>>> Bonus suggestion (by yours truly):
>>>>> 
>>>>> I'd like the option to allow functions to calculate expressions (it
>>>>> can be used for a single identifier too, but that's too heavy-handed).
>>>>> It won't be required for an expression, but is a good way to avoid any
>>>>> issues regarding '__builtin_global_ref', like variables shadowing the
>>>>> global variable. Example:
>>>>> 
>>>>> int global;
>>>>> 
>>>>> struct foo;
>>>>> static int counted_by_calc(struct foo *);
>>>>> 
>>>>> struct foo {
>>>>> char count;
>>>>> int fnord;
>>>>> int *buf __counted_by(counted_by_calc);
>>>>> };
>>>>> 
>>>>> static int counted_by_calc(struct foo *ptr) __attribute__((pure)) {
>>>>> return ptr->count * (global << 42) - ptr->fnord;
>>>>> }
>>>>> 
>>>>> A pointer to the current least enclosing, non-anonymous struct is
>>>>> passed into 'counted_by_calc' by the compiler.
>>>>> 
>>>>> Rationale: This gets rid of all ambiguities when calculating an
>>>>> expression. It's marked 'pure' so there should be no side-effects.
>>>>> 
>>>>> ---
>>>>> 
>>>>> I believe these suggestions cover everything we've discussed. Please
>>>>> comment with anything I missed and your opinions on each.
>>>>> 
>>>>> [1] 
>>>>> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510
>>>>> 
>>>>> Share and enjoy!
>>>>> -bw
>>> 
>>> 
>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to