> On Apr 1, 2025, at 15:25, Martin Uecker <uec...@tugraz.at> wrote: > > Am Dienstag, dem 01.04.2025 um 18:58 +0000 schrieb Qing Zhao: >> >>> On Apr 1, 2025, at 11:28, Martin Uecker <uec...@tugraz.at> wrote: >>> >>> Am Dienstag, dem 01.04.2025 um 15:01 +0000 schrieb Qing Zhao: >>>> >>>>> On Apr 1, 2025, at 10:04, Martin Uecker <uec...@tugraz.at> wrote: >>>>> >>>>> >>>>> >>>>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling: >>>>>>> I'd like to offer up this to solve the issues we're facing. This is a >>>>>>> combination of everything that's been discussed here (or at least that >>>>>>> I've been able to read in the centi-thread :-). >>>>> >>>>> Thanks! I think this proposal much better as it avoids undue burden >>>>> on parsers, but it does not address all my concerns. >>>>> >>>>> >>>>> From my side, the issue about compromising the scoping rules of C >>>>> is also about unintended non-local effects of code changes. In >>>>> my opinion, a change in a library header elsewhere should not cause >>>>> code in a local scope (which itself might also come from a macro) to >>>>> emit a warning or require a programmer to add a workaround. So I am >>>>> not convinced that adding warnings or a workaround such as >>>>> __builtin_global_ref is a good solution. >>>>> >>>>> >>>>> I could see the following as a possible way forward: We only >>>>> allow the following two syntaxes: >>>>> >>>>> 1. Single argument referring to a member. >>>>> >>>>> __counted_by(len) >>>>> >>>>> with an argument that must be a single identifier and where >>>>> the identifier then must refer to a struct member. >>>>> >>>>> (I still think this is not ideal and potentially >>>>> confusing, but in contrast to new scoping rules it is >>>>> at least relatively easily to explain as a special rule.). >>>>> >> >> So, in allowed syntax 1, the identifier inside counted_by attribute will be >> looked up inside >> the structure. >> >> This is our current implementation of the counted_by for FAM and my previous >> submitted >> patch for counted_by for Pointers inside structures. >> >> Keeping this syntax is good. >> >>>>> >>>>> 2. Forward declarations. >>>>> >>>>> __counted_by(size_t len; len + PADDING) >>>> >>>> In the above, the PADDING is some constant? >>> >>> In principle - when considering only the name lookup rules - >>> it could be a constant, a global variable, or an automatic >>> variable, i.e. any ordinary identifiers which is visible at >>> this point. >> >> I am a little confused here: >> Is this syntax 2 a new syntax, and with new name lookup rules other than the >> syntax 1? > > Yes. With the regular C name lookup rules other than syntax 1. > >> >> How should the identifiers inside counted_by attribute with this syntax be >> looked up? >> Inside the structure first? Then if not found, looking up the outer scope >> for identifiers in the >> PADDING part? > > The identifier in the forward declaration ("len") will be looked > up in the structure and will be made available when parsing > the expression. Any other identifiers (such as "PADDING") > will not be looked up in the structure. So it is always > clear where each identifier is going to be looked up.
Yeah, this sounds a good idea to me, and a nice compromise solution. -:) Then, if more than one members need to be in the expression, for example: int number; struct A { size_t count_1; size_t count_2; char *array __counted_by (size_t count_1; size_t count_2; count1 + count2 + number * 4) } i.e., all the members that will be in the counted_by expression should be declared first inside the counted_by, then all other variables in the expression could be looked up per the default scoping rule. Is the understanding correct? Qing > >> Then, has a new scoping been introduced now? >> Or some other special looking up rules for counted_by attribute? >> >>> >>>> >>>> More complicated expressions involving globals will not be supported? >>> >>> I think one could allow such expressions, But I think the >>> expressions should be restricted to expressions which have >>> no side effects. >> >> See my question in above, does this new syntax 2 introduce a new “structure >> scope” to enable >> the identifiers to be looked up inside the structure first as syntax 1? Or, >> this new syntax has the >> same lookup rule as the current C, will NOT look up inside the structure >> first? > > It will NOT look into the structure, except for the forward > declared identifier. > > > Martin > >> >>> >>>> >>>>> where then the second part can also be a more complicated >>>>> expression, but with the explicit requirement that all >>>>> identifiers in this expression are then looked up according to >>>>> regular C language rules. So except for the forward declared >>>>> member(s) they are *never* looked up in the member namespace of >>>>> the struct, i.e. no new name lookup rules are introduced. >>>> >>>> One question here: >>>> >>>> What’s the major issue if we’d like to add one new scoping rule, for >>>> example, >>>> “Structure scope” (the same as the C++’s instance scope) to C? >>>> >>>> (In addition to the "VLA in structure" issue I mentioned in my previous >>>> writeup, >>>> is there any other issue to prevent this new scoping rule being added into >>>> C ?). >>> >>> Note that the "VLA in structure" is a bit of a red herring. The exact same >>> issues apply to lookup of any other ordinary identifiers in this context. >>> >>> enum { MY_BUF_SIZE = 100 }; >>> struct foo { >>> char buf[MY_BUF_SIZE]; >>> }; >>> >> Yes, this is because there is NO “structure scope” available in C. As long >> as the “structure scope” >> is added into C, identifiers could be looked up inside the “structure scope” >> first before looking up >> outer scopes. >> >>> >>> C++ has instance scope for member functions. The rules for C++ are also >>> complex and not very consistent (see the examples I posted earlier, >>> demonstrating UB and compiler divergence). >> >> Yes, I studied those C++ examples when I wrote the proposal. And my >> observation >> was: in C++, the instance scope always has higher priority than local and >> global scopes. >> i.e, when there is a conflict between instance scope and local/global scope >> for the identifier, >> The identifier within the instance scope will shadow the one with the same >> name in the >> outer scope. >> >> But in C, there is No concept of “structure scope” at all. Identifiers will >> NOT looked up >> inside a structure at all. >> >>> For C such a concept would >>> be new and much less useful, so the trade-off seems unfavorable (in >>> constrast to C++ where it is needed). >> >> This concept is needed when referring a member variable inside the structure >> is needed, >> Such as the counted_by attribute, or later when we extend C language to >> include the bound info >> Into the TYPE. >> >> But I agree with you that introducing a new instance scope into C might be >> too risky. >> >> >>> I also see others issues: Fully >>> supporting instance scope would require changes to how C is parsed, >>> placing a burden on all C compilers and tooling. Depending on how you >>> specify it, it would also cause a change in semantics >>> for existing code, something C tries very hard to avoid. >> >> Yes, agreed. >> Introducing a new instance scope in C might be too risky, therefore not >> worth to >> do it. >> >> >>> If you add >>> warnings as mitigation, it has the problem that it causes non-local >>> effects where introducing a name in in enclosing scope somewhere else >>> now necessitates a change to unrelated code, exactly what scoping rules >>> are meant to prevent. >> >> Yes, that’s right. >>> >>> In any case, it seems a major change with many ramifications, including >>> possibly unintended ones. This should certainly not be done without >>> having a clear specification and support from WG14 (and probably not >>> done at all.) >> >> Yes, I agree. >> >> Qing >>> >>> Martin >>> >>>> >>>> Qing >>>> >>>> >>>>> >>>>> >>>>> I think this could address my concerns about breaking >>>>> scoping in C. Still, I personally would prefer designator syntax >>>>> for both C and C++ as a nicer solution, and one that already >>>>> has some support from WG14. >>>>> >>>>> Martin >>>>> >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> 1. The use of '__self' isn't feasible, so we won't use it. Instead, >>>>>>> we'll rely upon the current behavior—resolving any identifiers to the >>>>>>> "instance scope". This new scope is used __only__ in attributes, and >>>>>>> resolves identifiers to those in the least enclosing, non-anonymous >>>>>>> struct. For example: >>>>>>> >>>>>>> struct foo { >>>>>>> char count; >>>>>>> struct bar { >>>>>>> struct { >>>>>>> int len; >>>>>>> }; >>>>>>> struct { >>>>>>> struct { >>>>>>> int *valid_use __counted_by(len); // Valid. >>>>>>> }; >>>>>>> }; >>>>>>> int *invalid_use __counted_by(count); // Invalid. >>>>>>> } b; >>>>>>> }; >>>>>>> >>>>>>> Rationale: This is how '__guarded_by' currently resolves identifiers, >>>>>>> so there's precedence. And if we can't force its usage in all >>>>>>> situations, it's less a feature and more a "nicety" which will lead to >>>>>>> a massive discrepancy between compiler implementations. Despite the >>>>>>> fact that this introduces a new scoping mechanism to C, its use is not >>>>>>> as extensive as C++'s instance scoping and will apply only to >>>>>>> attributes. In the case where we have two different resolution >>>>>>> techniquest happening within the same structure (e.g. VLAs), we can >>>>>>> issue warnings as outlined in Yeoul's RFC[1]. >>>>>>> >>>>>>> 2. A method of forward declaring variables will be added for variables >>>>>>> that occur in the struct after the attribute. For example: >>>>>>> >>>>>>> A: Necessary usage: >>>>>>> >>>>>>> struct foo { >>>>>>> int *buf __counted_by(char count; count); >>>>>>> char count; >>>>>>> }; >>>>>>> >>>>>>> B: Unnecessary, but still valid, usage: >>>>>>> >>>>>>> struct foo { >>>>>>> char count; >>>>>>> int *buf __counted_by(char count; count); >>>>>>> }; >>>>>>> >>>>>>> * The forward declaration is required in (A) but not in (B). >>>>>>> * The type of 'count' as declared in '__counted_by' *must* match the >>>>>>> real type. >>>>>>> >>>>>>> Rationale: This alleviates the issues of "double parsing" for >>>>>>> compilers that aren't able to handle it. (We can also remove the >>>>>>> '-fexperimental-late-parse-attributes' flag in Clang.) >>>>>>> >>>>>>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is >>>>>>> added to refer to variables outside of the most-enclosing structure. >>>>>>> Example: >>>>>>> >>>>>>> int count_that_will_never_change_we_promise; >>>>>>> >>>>>>> struct foo { >>>>>>> int *bar >>>>>>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise)); >>>>>>> unsigned flags; >>>>>>> }; >>>>>>> >>>>>>> As Yeoul pointed out, there isn't a way to refer to variables that >>>>>>> have been shadowed, so the 'global' in '__builtin_global_ref' is a bit >>>>>>> of a misnomer as it could refer to a local variable. >>>>>>> >>>>>>> Rationale: For those who need the flexibility to use variables outside >>>>>>> of the struct, this is an acceptable escape route. It does make bounds >>>>>>> checking less strict, though, as we can't track any modifications to >>>>>>> the global, so caution must be used. >>>>>>> >>>>>>> Bonus suggestion (by yours truly): >>>>>>> >>>>>>> I'd like the option to allow functions to calculate expressions (it >>>>>>> can be used for a single identifier too, but that's too heavy-handed). >>>>>>> It won't be required for an expression, but is a good way to avoid any >>>>>>> issues regarding '__builtin_global_ref', like variables shadowing the >>>>>>> global variable. Example: >>>>>>> >>>>>>> int global; >>>>>>> >>>>>>> struct foo; >>>>>>> static int counted_by_calc(struct foo *); >>>>>>> >>>>>>> struct foo { >>>>>>> char count; >>>>>>> int fnord; >>>>>>> int *buf __counted_by(counted_by_calc); >>>>>>> }; >>>>>>> >>>>>>> static int counted_by_calc(struct foo *ptr) __attribute__((pure)) { >>>>>>> return ptr->count * (global << 42) - ptr->fnord; >>>>>>> } >>>>>>> >>>>>>> A pointer to the current least enclosing, non-anonymous struct is >>>>>>> passed into 'counted_by_calc' by the compiler. >>>>>>> >>>>>>> Rationale: This gets rid of all ambiguities when calculating an >>>>>>> expression. It's marked 'pure' so there should be no side-effects. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> I believe these suggestions cover everything we've discussed. Please >>>>>>> comment with anything I missed and your opinions on each. >>>>>>> >>>>>>> [1] >>>>>>> https://discourse.llvm.org/t/rfc-forward-referencing-a-struct-member-within-bounds-annotations/85510 >>>>>>> >>>>>>> Share and enjoy! >>>>>>> -bw >>>>> >>>>> >>>> >>> >>> -- >>> Univ.-Prof. Dr. rer. nat. Martin Uecker >>> Graz University of Technology >>> Institute of Biomedical Imaging