> On Apr 1, 2025, at 15:25, Martin Uecker <uec...@tugraz.at> wrote:
> 
> Am Dienstag, dem 01.04.2025 um 18:58 +0000 schrieb Qing Zhao:
>> 
>>> On Apr 1, 2025, at 11:28, Martin Uecker <uec...@tugraz.at> wrote:
>>> 
>>> Am Dienstag, dem 01.04.2025 um 15:01 +0000 schrieb Qing Zhao:
>>>> 
>>>>> On Apr 1, 2025, at 10:04, Martin Uecker <uec...@tugraz.at> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
>>>>>>> I'd like to offer up this to solve the issues we're facing. This is a
>>>>>>> combination of everything that's been discussed here (or at least that
>>>>>>> I've been able to read in the centi-thread :-).
>>>>> 
>>>>> Thanks! I think this proposal much better as it avoids undue burden
>>>>> on parsers, but it does not address all my concerns.
>>>>> 
>>>>> 
>>>>> From my side, the issue about compromising the scoping rules of C
>>>>> is also about unintended non-local effects of code changes. In
>>>>> my opinion, a change in a library header elsewhere should not cause 
>>>>> code in a local scope (which itself might also come from a macro) to
>>>>> emit a warning or require a programmer to add a workaround. So I am
>>>>> not convinced that adding warnings or a workaround such as
>>>>> __builtin_global_ref  is a good solution.
>>>>> 
>>>>> 
>>>>> I could see the following as a possible way forward: We only 
>>>>> allow the following two syntaxes:
>>>>> 
>>>>> 1. Single argument referring to a member.
>>>>> 
>>>>> __counted_by(len)
>>>>> 
>>>>> with an argument that must be a single identifier and where
>>>>> the identifier then must refer to a struct member. 
>>>>> 
>>>>> (I still think this is not ideal and potentially
>>>>> confusing, but in contrast to new scoping rules it is
>>>>> at least relatively easily to explain as a special rule.).  
>>>>> 
>> 
>> So, in allowed syntax 1, the identifier inside counted_by attribute will be 
>> looked up inside 
>> the structure. 
>> 
>> This is our current implementation of the counted_by for FAM and my previous 
>> submitted
>> patch for counted_by for Pointers inside structures. 
>> 
>> Keeping this syntax is good. 
>> 
>>>>> 
>>>>> 2. Forward declarations. 
>>>>> 
>>>>> __counted_by(size_t len; len + PADDING)
>>>> 
>>>> In the above, the PADDING is some constant? 
>>> 
>>> In principle - when considering only the name lookup rules -
>>> it could be a constant, a global variable, or an automatic
>>> variable, i.e. any ordinary identifiers which is visible at
>>> this point. 
>> 
>> I am a little confused here:
>> Is this syntax 2 a new syntax, and with new name lookup rules other than the 
>> syntax 1?
> 
> Yes. With the regular C name lookup rules other than syntax 1.
> 
>> 
>> How should the identifiers inside counted_by attribute with this syntax be 
>> looked up?
>> Inside the structure first? Then if not found, looking up the outer scope 
>> for identifiers in the
>> PADDING part?
> 
> The identifier in the forward declaration ("len") will be looked
> up in the structure and will be made available when parsing
> the expression.  Any other identifiers (such as "PADDING")
> will not be looked up in the structure.  So it is always
> clear where each identifier is going to be looked up.

Yeah, this sounds a good idea to me, and a nice compromise solution.  -:)

Then, if more than one members need to be in the expression, for example:

int number;

struct A {
   size_t count_1;
   size_t count_2;
   char *array __counted_by (size_t count_1; size_t count_2; count1 + count2 + 
number * 4)
}

i.e., all the members that will be in the counted_by expression should be 
declared first inside the 
counted_by, then all other variables in the expression could be looked up per 
the default scoping rule. 

Is the understanding correct?

Qing

> 
>> Then, has a new scoping been introduced now?
>> Or some other special looking up rules for counted_by attribute?
>> 
>>> 
>>>> 
>>>> More complicated expressions involving globals will not be supported?
>>> 
>>> I think one could allow such expressions, But I think the
>>> expressions should be restricted to expressions which have
>>> no side effects. 
>> 
>> See my question in above,  does this new syntax 2 introduce a new “structure 
>> scope” to enable
>> the identifiers to be looked up inside the structure first as syntax 1?  Or, 
>> this new syntax has the 
>> same lookup rule as the current C, will NOT look up inside the structure 
>> first?
> 
> It will NOT look into the structure, except for the forward
> declared identifier.
> 
> 
> Martin
> 
>> 
>>> 
>>>> 
>>>>> where then the second part can also be a more complicated 
>>>>> expression, but with the explicit requirement that all
>>>>> identifiers in this expression are then looked up according to
>>>>> regular C language rules. So except for the forward declared
>>>>> member(s) they are *never* looked up in the member namespace of
>>>>> the struct, i.e. no new name lookup rules are introduced.
>>>> 
>>>> One question here:
>>>> 
>>>> What’s the major issue if we’d like to add one new scoping rule, for 
>>>> example,
>>>> “Structure scope” (the same as the C++’s instance scope) to C? 
>>>> 
>>>> (In addition to the "VLA in structure" issue I mentioned in my previous 
>>>> writeup, 
>>>> is there any other issue to prevent this new scoping rule being added into 
>>>> C ?).
>>> 
>>> Note that the "VLA in structure" is a bit of a red herring.  The exact same
>>> issues apply to lookup of any other ordinary identifiers in this context.
>>> 
>>> enum { MY_BUF_SIZE = 100 };
>>> struct foo {
>>> char buf[MY_BUF_SIZE];
>>> };
>>> 
>> Yes, this is because there is NO “structure scope” available in C. As long 
>> as the “structure scope”
>> is added into C, identifiers could be looked up inside the “structure scope” 
>> first before looking up
>> outer scopes. 
>> 
>>> 
>>> C++ has instance scope for member functions. The rules for C++ are also
>>> complex and not very consistent (see the examples I posted earlier,
>>> demonstrating UB and compiler divergence). 
>> 
>> Yes, I studied those C++ examples when I wrote the proposal. And my 
>> observation
>> was: in C++, the instance scope always has higher priority than local and 
>> global scopes. 
>> i.e, when there is a conflict between instance scope and local/global scope 
>> for the identifier,
>> The identifier within the instance scope will shadow the one with the same 
>> name in the 
>> outer scope. 
>> 
>> But in C, there is No concept of “structure scope” at all. Identifiers will 
>> NOT looked up 
>> inside a structure at all.
>> 
>>> For C such a concept would
>>> be new and much less useful, so the trade-off seems unfavorable (in
>>> constrast to C++ where it is needed).
>> 
>> This concept is needed when referring a member variable inside the structure 
>> is needed, 
>> Such as the counted_by attribute, or later when we extend C language to 
>> include the bound info
>> Into the TYPE. 
>> 
>> But I agree with you that introducing a new instance scope into C might be 
>> too risky. 
>> 
>> 
>>> I also see others issues:  Fully
>>> supporting instance scope would require changes to how C is parsed, 
>>> placing a burden on all C compilers and tooling. Depending on how you 
>>> specify it, it would also cause a change in semantics
>>> for existing code, something C tries very hard to avoid.
>> 
>> Yes, agreed. 
>> Introducing a new instance scope in C might be too risky, therefore not 
>> worth to
>> do it. 
>> 
>> 
>>> If you add
>>> warnings as mitigation,  it has the problem that it causes non-local
>>> effects where introducing a name in in enclosing scope somewhere else
>>> now necessitates a change to unrelated code, exactly what scoping rules
>>> are meant to prevent.  
>> 
>> Yes, that’s right. 
>>> 
>>> In any case, it seems a major change with many ramifications, including
>>> possibly unintended ones. This should certainly not be done without
>>> having a clear specification and support from WG14 (and probably not
>>> done at all.)
>> 
>> Yes, I agree. 
>> 
>> Qing
>>> 
>>> Martin
>>> 
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> I think this could address my concerns about breaking
>>>>> scoping in C. Still, I personally would prefer designator syntax
>>>>> for both C and C++ as a nicer solution, and one that already
>>>>> has some support from WG14.
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> 1. The use of '__self' isn't feasible, so we won't use it. Instead,
>>>>>>> we'll rely upon the current behavior—resolving any identifiers to the
>>>>>>> "instance scope". This new scope is used __only__ in attributes, and
>>>>>>> resolves identifiers to those in the least enclosing, non-anonymous
>>>>>>> struct. For example:
>>>>>>> 
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> struct bar {
>>>>>>>  struct {
>>>>>>>    int len;
>>>>>>>  };
>>>>>>>  struct {
>>>>>>>    struct {
>>>>>>>      int *valid_use __counted_by(len); // Valid.
>>>>>>>    };
>>>>>>>  };
>>>>>>>  int *invalid_use __counted_by(count); // Invalid.
>>>>>>> } b;
>>>>>>> };
>>>>>>> 
>>>>>>> Rationale: This is how '__guarded_by' currently resolves identifiers,
>>>>>>> so there's precedence. And if we can't force its usage in all
>>>>>>> situations, it's less a feature and more a "nicety" which will lead to
>>>>>>> a massive discrepancy between compiler implementations. Despite the
>>>>>>> fact that this introduces a new scoping mechanism to C, its use is not
>>>>>>> as extensive as C++'s instance scoping and will apply only to
>>>>>>> attributes. In the case where we have two different resolution
>>>>>>> techniquest happening within the same structure (e.g. VLAs), we can
>>>>>>> issue warnings as outlined in Yeoul's RFC[1].
>>>>>>> 
>>>>>>> 2. A method of forward declaring variables will be added for variables
>>>>>>> that occur in the struct after the attribute. For example:
>>>>>>> 
>>>>>>> A: Necessary usage:
>>>>>>> 
>>>>>>> struct foo {
>>>>>>> int *buf __counted_by(char count; count);
>>>>>>> char count;
>>>>>>> };
>>>>>>> 
>>>>>>> B: Unnecessary, but still valid, usage:
>>>>>>> 
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> int *buf __counted_by(char count; count);
>>>>>>> };
>>>>>>> 
>>>>>>> * The forward declaration is required in (A) but not in (B).
>>>>>>> * The type of 'count' as declared in '__counted_by' *must* match the 
>>>>>>> real type.
>>>>>>> 
>>>>>>> Rationale: This alleviates the issues of "double parsing" for
>>>>>>> compilers that aren't able to handle it. (We can also remove the
>>>>>>> '-fexperimental-late-parse-attributes' flag in Clang.)
>>>>>>> 
>>>>>>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is
>>>>>>> added to refer to variables outside of the most-enclosing structure.
>>>>>>> Example:
>>>>>>> 
>>>>>>> int count_that_will_never_change_we_promise;
>>>>>>> 
>>>>>>> struct foo {
>>>>>>> int *bar 
>>>>>>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise));
>>>>>>> unsigned flags;
>>>>>>> };
>>>>>>> 
>>>>>>> As Yeoul pointed out, there isn't a way to refer to variables that
>>>>>>> have been shadowed, so the 'global' in '__builtin_global_ref' is a bit
>>>>>>> of a misnomer as it could refer to a local variable.
>>>>>>> 
>>>>>>> Rationale: For those who need the flexibility to use variables outside
>>>>>>> of the struct, this is an acceptable escape route. It does make bounds
>>>>>>> checking less strict, though, as we can't track any modifications to
>>>>>>> the global, so caution must be used.
>>>>>>> 
>>>>>>> Bonus suggestion (by yours truly):
>>>>>>> 
>>>>>>> I'd like the option to allow functions to calculate expressions (it
>>>>>>> can be used for a single identifier too, but that's too heavy-handed).
>>>>>>> It won't be required for an expression, but is a good way to avoid any
>>>>>>> issues regarding '__builtin_global_ref', like variables shadowing the
>>>>>>> global variable. Example:
>>>>>>> 
>>>>>>> int global;
>>>>>>> 
>>>>>>> struct foo;
>>>>>>> static int counted_by_calc(struct foo *);
>>>>>>> 
>>>>>>> struct foo {
>>>>>>> char count;
>>>>>>> int fnord;
>>>>>>> int *buf __counted_by(counted_by_calc);
>>>>>>> };
>>>>>>> 
>>>>>>> static int counted_by_calc(struct foo *ptr) __attribute__((pure)) {
>>>>>>> return ptr->count * (global << 42) - ptr->fnord;
>>>>>>> }
>>>>>>> 
>>>>>>> A pointer to the current least enclosing, non-anonymous struct is
>>>>>>> passed into 'counted_by_calc' by the compiler.
>>>>>>> 
>>>>>>> Rationale: This gets rid of all ambiguities when calculating an
>>>>>>> expression. It's marked 'pure' so there should be no side-effects.
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> I believe these suggestions cover everything we've discussed. Please
>>>>>>> comment with anything I missed and your opinions on each.
>>>>>>> 
>>>>>>> [1] 
>>>>>>> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510
>>>>>>> 
>>>>>>> Share and enjoy!
>>>>>>> -bw
>>>>> 
>>>>> 
>>>> 
>>> 
>>> -- 
>>> Univ.-Prof. Dr. rer. nat. Martin Uecker
>>> Graz University of Technology
>>> Institute of Biomedical Imaging


Reply via email to