+ Aaron
> On Mar 6, 2025, at 4:27 PM, Yeoul Na <yeoul...@apple.com> wrote: > > Hi Qing, > > Thanks for writing up the RFC and keeping us in the loop. Are you planning to > add “__self.” to GCC's C++ compiler as well in the future? The problem we > have with “__self” being a default way of annotating bounds is that C++ > compatibility because bounds annotations are supposed to work in headers > shared between C and C++ and C++ should be able to parse it to secure the > boundary between the two languages. Another problem is the usability. The > user will have to write more code “__self.” all the time in the most common > use cases, which would be a huge regression for the usability of the language. > > We are planning to write up alternative proposal without having to introduce > a new syntax to the C standard. We’ll discuss how we address problems raised > here. Please see my inlined comments. > > Best, > Yeoul > > >> On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote: >> >> + John & Félix & Patryk & Henrik >> >>> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote: >>> >>> Hi, >>> >>> Since I sent the patch series for “extend counted_by attribute to pointer >>> fields of structure” two months ago, a lot of discussion were invoked both >>> in >>> GCC community and CLANG community: >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html >>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/131?u=gwelymernans >>> >>> After reading all these discussions, understanding, studying, more >>> discussions, >>> and finally making the whole picture clearer, we came up with a proposal to >>> change >>> the current design and add a new syntax for the argument of counted_by >>> attribute. >>> >>> The original idea of the new syntax was from Joseph, Michael and Martin, >>> Bill and Kees >>> involved in the whole process of the proposal, providing a lot of >>> suggestions and >>> comments. Really appreciate for the help from all of them. >>> >>> In this thread, I am also CC’ing several people from Apple who worked on >>> the -fbounds-safety >>> project on CLANG side: yeoul...@apple.com <mailto:yeoul...@apple.com>, >>> d_tard...@apple.com <mailto:d_tard...@apple.com>, dl...@apple.com >>> <mailto:dl...@apple.com>, >>> and dcough...@apple.com <mailto:dcough...@apple.com>. >>> >>> Please take a look at the proposal in below. >>> >>> Let me know if you have any comments and suggestions. >>> >>> Thanks. >>> >>> Qing. >>> >>> ========================================= >>> >>> New syntax for the argument of counted_by attribute >>> --An extension to C language >>> >>> Outline >>> >>> 0. A simple summary of the proposal >>> >>> 1. The motivation >>> 1.1 The current syntax of the counted_by argument might break existing >>> legal C code >>> 1.2 New requests from the users of the counted_by attribute >>> 1.2.1 Refer to a field in the nested structure >>> 1.2.2 Refer to globals or locals >>> 1.2.3 Represent simple expression >>> 1.2.4 Forward referencing >>> >>> 2. The requirement >>> >>> 3. The proposed new syntax >>> 3.1 Legal C code with VLA works correctly when mixing with counted_by >>> 3.2 Satisfy all the new requests >>> 3.2.1 Refer to a field in the nested structure >>> 3.2.2 Refer to globals or locals >>> 3.2.3 Represent simple expression >>> 3.3 How to resolve the forward reference issue in section 1.2.4? >>> >>> Appendix A: Scope of variables in C and C++ >>> --The hints to the design of counted_by in C >>> Appendix B: An example in linux kernel that the global cannot be "const" >>> qualified >>> >>> >>> 0. A simple summary of the proposal >>> >>> We propose a new syntax to the argument of the counted_by attribute: >>> * Introduce a new keyword, __self, to represent the new concept, >>> "the current object" of the nearest non-anonymous enclosing structure, >>> which allows the object of the structure to refer to its own member inside >>> the structure definition. >>> >>> * With the new keyword, __self, the member variable can be referenced >>> by appending the member access operator "." to "__self", such as, >>> __self.member. >>> >>> * This new keyword is invalid except in the bounds checking attributes, >>> such as "counted_by", etc., inside a structure definition. >>> >>> * Simple expression is enabled by this new keyword inside the attribute >>> counted_by with the following limitation: >>> A. no side-effect is allowed; >>> and >>> B. the operators of the expression are simple arithmetic operators, and the >>> operands could be one of: >>> B.1 __self.member or __self.member1.member2...(for nested structure); >>> B.2 constant; >>> B.3 locals that will not be changed after initialization; >>> B.4 globals that will not be changed after initialization; >>> >>> >>> 1. The motivation >>> >>> There are two major motivations for this new syntax. >>> >>> 1.1 The current syntax of the counted_by argument might break existing >>> legal C code >>> >>> The counted_by attribute is currently defined as: >>> (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute) >>> >>> counted_by (count) >>> The counted_by attribute may be attached to the C99 flexible array member >>> of a structure. It indicates that the number of the elements of the array >>> is >>> given by the field "count" in the same structure as the flexible array >>> member. >>> >>> For example: >>> >>> int count; >>> struct X { >>> int count; >>> char array[] __attribute__ ((counted_by (count))); >>> }; >>> >>> In the above, the argument of the attribute "count" is an identifier that >>> will be >>> looked up in the scope of the enclosing structure "X". Due to this new >>> scope >>> of variable, the identifier "count" refers to the member variable "count" >>> of this >>> structure but not the global variable defined outside of the structure. >>> >>> This is a new scope of variable that is added to the C language. In C, the >>> default available scopes of variable include only two scopes, global scope >>> and local scope. >>> >>> The global scope refers to the region outside any function or block. >>> The variables declared here are accessible throughout the entire program. >>> >>> The local scope refers to the region enclosed between the { } braces, which >>> represent the boundary of a function or a block inside functions. The >>> variables >>> declared within a function or a block are only accessible locally inside >>> that >>> function or that block and other blocks nested inside. >>> >>> (Please see Appendix A for more details on scope of variables in C and C++ >>> and why the current design of counted_by attribute is a disaster to C) >>> >>> Note, the { } brace that marks the boundary of a structure does not change >>> the current scope of the variable with the default scoping rules in C. >>> >>> As a result, in the above example, with C's default scoping rule, the >>> "count” >>> inside counted_by attribute _should_ refer to the global variable "count" >>> but >>> not the member variable in the enclosing structure. >>> >>> A more compelling example is shown below when mixing counted_by attribute >>> with C's Variable Length Array (VLA). >>> >>> void boo (int k) >>> { >>> const int n = 10; // a local variable n >>> struct foo { >>> int n; // a member variable n >>> int a[n + 10]; // for VLA, this n refers to the local variable n. >>> char b[] __attribute__ ((counted_by(n))); >>> // for counted_by, this n refers to the member variable n. >>> }; >>> } >>> >>> This code is bad. The size expression "n+10" of the VLA "a" follows the >>> default >>> scoping rule of C, as a result, "n" refers to the local variable "n" that >>> is defined >>> outside of the structure "foo"; However, the argument "n" of the counted_by >>> attribute of the flexible array member b[] follows the new scoping rule, it >>> refers >>> to the member variable "n" inside this structure. >>> >>> It's clear that the current design of the counted_by argument introduced a >>> new >>> scoping rule into C, resulting an inconsistent scoping resolution situation >>> in >>> C language. >>> >>> This is a design mistake, and should be fixed. > > We will have a different proposal based on reporting diagnostics on the name > conflicts. We need to diagnose the name conflicts like above anyway because > in code like that almost always the struct contains a buffer and its size as > the fields. Given that the program’s intention would be more likely to pick > up the member `n`, instead of some random global happened to be with the same > name in the same translation unit. Therefore, we should diagnose such cases > to avoid mistakes and avoid the program silently working with an unintended > way with the user mistake. Also, this program will have a different meaning > in C++, so that’s another reason to always diagnose with such ambiguity. > Also, the bounds annotation user might have just forgotten to add “__self.” > because it’s so intuitive to use the member name inside the attributes (I > know what’s “intuitive" depends on people’s background, but that’s what we > observed from massive adoption experience within Apple). This leaves the > feature error-prone, because the most intuitive syntax for bounds annotations > will be compiled into a different meaning (using the global as the size > instead of the peer member). So we should really diagnose it even if we add > “__self" to avoid the mistake. > > Now, if we always diagnose it, then the lookup order doesn’t really matter > anymore. That means we will have an option to keep the current lookup rule of > C, and pick up the member name only when the global name is not available > (just one possible option). I see “__self.” being used as a suppression > mechanism if the programmer cannot change the name of the conflicting global > or member. But that doesn’t mean “__self” should be a default way of writing > the code. Suppression mechanisms are typically only used to suppress the > warnings and disambiguate. And this would mean we also need a way to > disambiguate it to mean global. C++ already has `::` but C doesn’t currently > have a scope qualifier but in order to use this new bounds safety feature, we > may need to invent something. Adding a new syntax is a risk so until we > standardize it I would suggest something like `__builtin_global_ref()` > Another thing to note: Clang has already started adding such diagnostics to avoid the ambiguity. Here is PR from Aaron: https://github.com/llvm/llvm-project/pull/129772 > >>> >>> 1.2 New requests from the users of the counted_by attribute >>> >>> The counted_by attribute for Flexible Array Member (FAM) has been adopted >>> in >>> Linux Kernel extensively. New requests came in in order to cover more >>> cases. >>> >>> 1.2.1 Refer to a field in the nested structure >>> >>> This was requested from linux kernel. >>> https://www.spinics.net/lists/linux-rdma/msg127560.html >>> >>> A simplified testing case is: >>> >>> struct Y { >>> int n; >>> int other; >>> } >>> >>> struct Z { >>> struct Y y; >>> int array[] __attribute__ ((counted_by(?y.n))); >>> }; >>> >>> in the above, what should be put instead of "?" to refer to the field "n" >>> of the >>> field "y" of the current object of this struct Z? >>> >>> NOTE, we should completely reject the use cases that refer to a field in an >>> outer structure from an inner non-anonymous structure, such as: >>> >>> struct A { >>> int count; >>> struct B { >>> int other; >>> int z[] __attribute__ ((counted_by(?))); >>> } b; >>> }; >>> >>> In the above, we should not allow the counted_by "?" of the FAM field "z" >>> of >>> the struct B to refer to the member variable "count" of the outer struct A. >>> Otherwise, when an object with the struct B is passed to a function, there >>> will be error when refer to the counted_by of its field "z". >>> >>> However, the counted_by attribute of a field in the inner anonymous >>> structure >>> should be allowed to refer to a field of the outer structure. Since the >>> inner >>> anonymous structure can not be used independently of its enclosing >>> structure, >>> such as: >>> >>> struct A { >>> int count; >>> struct { >>> int other; >>> int z[] __attribute__ ((counted_by(count))); >>> }; >>> } a; >>> >>> In the above testing case, the counted_by attribute for the field "z" of >>> the inner >>> anonymous structure should be able to refer to the field of the outer >>> structure. > > I couldn’t get the relation between the named nested struct and anonymous > struct here. Members of anonymous structure are essentially part of the outer > struct. And the members are already accessed the same as direct members of > the outer struct. It should work as below: > > > struct A { > int count; > struct B { > int other; > int z[] __attribute__ ((counted_by(count))); // error: reference to undefined > identifier `count`. > } b; > }; > > > struct A { > int count; > struct { > int other; > int z[] __attribute__ ((counted_by(count))); // works as members of anonymous > structure is part of structure A > }; > } a; > > > So I don’t see why this will prevent us from doing (counted_by(y.n)) without > needing any additional prefix. > > >>> >>> >>> 1.2.2 Refer to globals or locals >>> >>> One request from linux kernel is here: >>> https://lore.kernel.org/all/202309221128.6AC35E3@keescook/ >>> >>> A simple example is: >>> >>> int count;// global variable >>> struct X { >>> int count; // member variable >>> char array[] __attribute__ ((counted_by(??count))); >>> // How to refer to the global variable "count" >>> // but not the member variable "count" of the struct X? >>> } >>> >>> when the counted_by attribute tries to refer to the global variable "count” >>> outside >>> the structure, how to distinguish it with its member variable "count"? > > Again, this should be diagnosed and the programmer either needs to change the > name or use a suppression mechanism. As I suggested earlier we can introduce > something like __builtin_global_ref(), until we get a blessing from the C > committee to add a scope qualifier syntax in C. > >>> >>> NOTE, Users need to make sure that the global or local variables should not >>> be >>> changed after they are initialized; otherwise, the results of the array >>> bound >>> sanitizer and the __builtin_dynamic_object_size is undefined. >>> >>> Theoretically, We should limit the globals and locals ONLY to const >>> qualified >>> globals and locals to avoid abusing of this feature in the future. However, >>> due >>> to the existing code in linux kernel cannot be easily changed with const >>> qualifier. >>> We have to relax the limitation. See Appendix B for such an example in >>> linux kernel. >>> >>> In the future language extension, We should limit the globals and locals >>> ONLY >>> to const qualified globals and locals. >>> >>> 1.2.3 Represent simple expression >>> >>> This was requested multiple times from Linux kernel. One of the requests is: >>> https://lore.kernel.org/lkml/20210727205855.411487-63-keesc...@chromium.org/ >>> >>> For example: >>> >>> int elm_size; >>> struct X { >>> int count; >>> char array[] __attribute__ ((counted_by(?count * elm_size))); >>> } >>> >>> in the above, what should be put instead of "?" to represent this simple >>> expression? > > It should just work without any prefix because there’s no name conflict here, > it will be clear what each unqualified name is referring to. > > constexpr int elm_size; > struct X { > int count; > char array[] __attribute__ ((counted_by(count * elm_size))); > } > > I think this is not too different from this: > > int elem_size; > int foo(void) { > int count; > return count * elem_size; > }; > > > >>> >>> NOTE, We should limit simple expressions to: >>> >>> A. no side-effect is allowed, >>> and >>> B. the operators of the expression are simple arithmetic operators, and the >>> operands >>> could be one of the following: >>> B.1 the member variable of the enclosing structure or inner structure of >>> the enclosing structure; >>> B.2 constant; >>> B.3 locals that will not be changed after initialization; >>> B.4 globals that will not be changed after initialization; >>> >>> 1.2.4 Forward referencing >>> >>> This request is only for counted_by attribute of pointers. Since the >>> flexible array >>> members(FAM) are always the last field of the containing structure, forward >>> reference issue does not exist for counted_by of FAM. >>> >>> How should we handle the situation when the counted_by attribute refers to >>> a member variable that is declared after the pointer field in the structure? >>> >>> For example: >>> >>> struct bar { >>> char *array __attribute__ ((counted_by(??count))); >>> int count; } >>> >>> in the above, how can we refer to the field "count" that is declared after >>> the >>> pointer field "array" in the structure? > > We should be able to refer to an undeclared field anyway even with “__self." > no? “__self.” doesn’t solve the problem that you should still be able to > forward reference a member. > >>> >>> 2. The requirement: >>> >>> This is an extension to C language, We should avoid adding a new scope of >>> variable (as the current syntax of the counted_by attribute for FAM) to >>> break >>> the existing legal C code. We should follow the default C language scoping >>> rules, keep the current valid C code working properly. > > We have a way to not change the meaning of the existing code without > introducing a new syntax, but diagnosing already error-prone code that should > apply to both VLAs and bounds annotations. We are planning to write up a > proposal to the C standard soon. > >>> >>> 3. The proposed new syntax: >>> >>> * Keep the default C scoping rules. >>> >>> * Introduce a new keyword, __self, to represent the new concept, "the >>> current object” >>> of the nearest non-anonymous enclosing structure, which allows the object >>> of the >>> structure to refer to its own member inside the structure definition. >>> This is similar >>> as the concept of "this" in C++, except that __self should be treated as a >>> special >>> variable but not a pointer. >>> >>> * With the new keyword, __self, the member variable can be referenced by >>> appending >>> the member access operator "." to "__self", such as, __self.member. This >>> is similar >>> as referring a member variable through a variable with the structure type >>> in the C >>> language. >>> >>> * This new keyword is invalid except in the bounds checking attributes, >>> such as >>> "counted_by", etc., inside a structure definition. >>> >>> * Simple expression is allowed inside the attribute counted_by with the >>> following limitation: >>> >>> A. no side-effect is allowed, >>> and >>> B. the operators of the expression are simple arithmetic operators, and the >>> operands >>> could be one of: >>> B.1 __self.member or __self.member1.member2...(for nested structure); >>> B.2 constant; >>> B.3 locals that will not be changed after initialization; >>> B.4 globals that will not be changed after initialization; >>> >>> With the new syntax, the problems 1.1, 1.2.1 and 1.2.2 and 1.2.3 can be >>> resolved >>> naturally as following: >>> >>> 3.1 Legal C code with VLA works correctly when mixing with counted_by >>> >>> The previously bad code mixing with VLA is now: >>> >>> void boo (int k) >>> { >>> const int n = 10; // a local variable n >>> struct foo { >>> int n; // a member variable n >>> int a[n + 10]; // for VLA, this n refers to the local variable n. >>> char b[] __attribute__ ((counted_by(__self.n))); >>> // for counted_by, this __self.n refers to the member variable n. >>> }; >>> } >>> >>> Now, We keep the default C scoping rule and make the counted_by referring >>> to member variable in the same structure correctly without ambiguity. >>> >>> 3.2 Satisfy all the new requests >>> >>> With this new syntax, all the new requests in section 1.2 (except 1.2.4 >>> Forward >>> referencing) are resolved naturally. >>> >>> 3.2.1 Refer to a field in the nested structure >>> >>> struct Y { >>> int n; >>> int other; >>> } >>> >>> struct Z { >>> struct Y y; >>> int *array __attribute__ ((counted_by(__self.y.n))); >>> }; >>> >>> 3.2.2 Refer to globals or locals >>> >>> int count; >>> struct X { >>> char others; >>> char array[] __attribute__ ((counted_by(count))); >>> } >>> >>> Since the new syntax keeps the default scoping rule of C language, the >>> "count” >>> without any prefix inside the counted_by attribute refers to the current >>> visible >>> variable in the current scope, that is the global variable "count”. >>> >>> 3.2.3 Represent simple expression >>> >>> When we can distinguish globals/locals from the member variables with this >>> new syntax, simple expressions are represented naturally: >>> >>> int elm_size; >>> struct X { >>> int count; >>> int *array __attribute__ ((counted_by(__self.count * elm_size))); >>> } >>> >>> More complicated example: >>> >>> struct foo { >>> int n; >>> float f; >>> } >>> >>> A. >>> #define NETLINK_HEADER_BYTES 8 >>> struct bar1 { >>> struct foo y[5][10]; >>> char *array __attribute__ ((counted_by(__self.y[1][3].n - >>> NETLINK_HEADER_BYTES))); >>> } >>> >>> B. struct bar2 { >>> int n; >>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n))); >>> }; >>> >>> C. >>> struct bar3 { >>> int n; >>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n + >>> __self.n))); >>> }; >>> >>> >>> 3.3 How to resolve the forward reference issue in section 1.2.4? >>> >>> The new syntax naturally resolved all the problems we listed in section 1.2 >>> except the forward reference issue: >>> >>> If the member variable that is referred inside the counted_by is declared >>> after >>> the pointer field with the counted_by attribute, such as: >>> >>> struct bar { >>> char *array __attribute__ ((counted_by(__self.count))); >>> int count; } >>> >>> In the above code, when "__self.count" is referred, its declaration is not >>> available, >>> compiler doesn't know its type yet. >>> >>> If it is a regular global or a local variable, this is a source code error, >>> C FE reports >>> an error and aborts. User should fix this coding error by adding the >>> declaration >>> of the variable before its first reference in the source code. >>> >>> Theoretically, in C, we should treat this as a source code error too. >>> However, due to existing cases in the application (i.e, Linux Kernel), in >>> order to >>> avoid the source code change which might be painful or impossible due to >>> existing ABI, can we accept such cases and handle it in compiler? >>> >>> I think this might be doable during the implementation of the counted_by >>> attribute >>> in C FE: >>> >>> A. when C FE parses the new keyword __self, the whole containing structure >>> has >>> not yet been seen completely, as a result, the FE has to insert a >>> placeholder for >>> __self, and delay the real IR generation after the whole structure being >>> parsed. >>> So, a small late handling ONLY for this placeholder _cannot_ be avoided. >>> >>> B. Then during this late handling of the placeholder, the C FE already >>> parses the >>> whole structure, the declaration of the field is known at that time, the >>> forward >>> reference issue can be resolved naturally. >>> >>> This can be illustrated in the following small example: >>> >>> struct bar { >>> char *array __attribute__ ((counted_by(__self.count))); >>> /* We haven't encountered 'count' yet, so we assume it's something like >>> 'size_t' for now when inserting the placeholder for "__self". */ >>> int count; >>> }; /* At this point, we know everything about the struct, we can handle >>> the placeholder for "__self" and also go back and use 'int" for >>> the type to refer count */ >>> >>> >>> Appendix A: Scope of variables in C and C++ >>> --The hints to the design of counted_by in C >>> >>> Scope of a variable defines the region of the code in which this variable >>> can >>> be accessed and modified. >>> >>> 1. What's common on the scope of variables between C and C++? >>> >>> **First, there are mainly two types of variable scopes: >>> >>> A. Global Scope >>> The global scope refers to the region outside any function or block. The >>> variables declared here are accessible throughout the entire program >>> and are called Global Variables. >>> >>> B. Local Scope >>> The local scope refers to the region enclosed between the { } braces, >>> which represent the boundary of a function or a block inside functions. >>> The variables declared within a function or a block are only accessible >>> locally inside that function or that block and other blocks nested inside. >>> >>> NOTE 1: the {} brace that mark the boundary of a structure/class does >>> not change whether the current scope is global or local. >>> >>> **Second, if two variables with same name are defined in different scopes, >>> one in local scope and the other in global scope, the precedence is given >>> to the local variable: >>> >>> [opc@qinzhao~]$ cat t1.c >>> // Global variable >>> int a = 5; >>> int main() { >>> // Local variable with same name as that of >>> // global variable >>> int a = 100; >>> // Accessing a >>> __builtin_printf ("a is %d\n", a); return 0; >>> } >>> [opc@qinzhao~]$ gcc t1.c; ./a.out >>> a is 100 >>> [opc@qinzhao~]$ g++ t1.c; ./a.out >>> a is 100 >>> >>> >>> 2. What's different on the scope of variables between C and C++? >>> >>> C++ has 3 additional variations of scopes: >>> >>> A. Instance Scope (member scope): >>> >>> The instance scope, also called member scope, refers to the region inside >>> a class/structure but outside any member function of the class/structure. >>> The variables, i.e, the data members, declared here are accessible to the >>> whole class/structure. They can be accessed by the object (i.e., the >>> instance) >>> of the class/structure. >>> >>> [opc@qinzhao~]$ cat t2.C >>> struct foo { >>> int bar1(void) { return m; }; // m refers to the member variable >>> int bar2(void) { int m = 20; return m; }; // return m refers to the >>> local variable m = 20 >>> int bar3(void) { int m = 30; return this->m; }; // this->m refers to >>> the member variable >>> foo (int val) { m = val; }; // m refers to the member variable >>> int m; // Member variable with instance scope, accessible to the whole >>> structure/class >>> }; >>> >>> int main () >>> { >>> struct foo f(10); >>> __builtin_printf (" bar1 is %d \n", f.bar1()); >>> __builtin_printf (" bar2 is %d \n", f.bar2()); >>> __builtin_printf (" bar3 is %d \n", f.bar3()); >>> return 0; >>> } >>> [opc@qinzhao~]$ g++ t2.C; ./a.out >>> bar1 is 10 bar2 is 20 bar3 is 10 >>> >>> Explanation: The member variable "m" is declared inside the structure "foo" >>> but >>> outside any member function of "foo", it has instance scope. This variable >>> is >>> visible to all the member functions of the structure "foo". when there is a >>> name >>> conflict with a local variable inside a member function, for example, >>> "bar2”, >>> the local variable has higher precedence. When trying to explicitly refer >>> to the >>> member variable in the member function, adding the C++ "this" pointer >>> before >>> it, for example, "bar3”. >>> >>> NOTE 2: the {} brace that marks the boundary of a structure/class changes >>> the >>> variable scope to "instance scope" in C++. >>> >>> B. Static Member Scope >>> >>> The static member scope refers to variables declared with the static >>> keyword >>> within the class/structure. These variables can be accessed using the class >>> name without creating the instance. >>> >>> [opc@qinzhao~]$ cat t3.C >>> struct foo { >>> static int m; // Static member variable with static member scope, >>> // accessible in whole structure/class >>> }; >>> int foo::m = 10; >>> int main () >>> { >>> __builtin_printf (" foo::m is %d\n", foo::m); >>> return 0; >>> } >>> [opc@qinzhao~]$ g++ t3.C; ./a.out >>> foo::m is 10 >>> >>> NOTE 3: static member in structure is not available in C. >>> >>> C. Namespace Scope >>> >>> A namespace in C++ is a container that allows users to create a separate >>> scope >>> where the given variables are defined. It is used to avoid name conflicts >>> and group >>> related code together. These variables can be accessed using their >>> namespace >>> name and scope resolution operator. >>> >>> [opc@qinzhao~]$ cat t4.C >>> namespace foo { >>> int m = 10; // Namespace scope variable >>> }; >>> int main () >>> { >>> __builtin_printf (" foo::m is %d\n", foo::m); >>> return 0; >>> } >>> [opc@qinzhao~]$ g++ t4.C; ./a.out >>> foo::m is 10 >>> >>> NOTE 4: namespaces are not available in C language. >>> >>> 3. A simple summary comparing C to C++ >>> >>> A. there are only two variable scopes in C: >>> >>> global scope >>> local scope >>> >>> all the other 3 variant variable scopes in C++,i.e., instance scope (member >>> scope), >>> static member scope, namespace scope, are not available in C. >>> >>> Since there is no static member and namespace in C language, accessing to >>> static >>> member variables of a structure or variables declared in another namespace >>> is >>> not needed in C at all. >>> >>> NOTE 5: However, accessing the member of a structure inside the structure >>> is >>> needed for the purpose of counted_by extension in C. >>> >>> B. the {} brace that represents the boundary of the structure does not >>> change the >>> scope of the variable in C since C doesn't have instance scope (i.e.,member >>> scope); >>> >>> The following examples can show these limitation in C language. >>> >>> C currently support variable length array (VLA), whose array size could be >>> a >>> variable expression. VLA is only supported in local scopes in C. >>> >>> [opc@qinzhao~]$ cat t5.c >>> void boo (int k) >>> { >>> const int n = 10; >>> struct foo { >>> int m; >>> int a[n + k]; >>> }; >>> } >>> [opc@qinzhao~]$ gcc t5.c -S >>> >>> Explanation: This is good. The {} brace that marks the boundary of the >>> structure "foo” >>> does NOT change the scope of the variable n and k, their definitions reach >>> the >>> declaration of the array member field a[n + k]. >>> >>> However, when changing the testing case as: >>> [opc@qinzhao~]$ cat t6.c >>> void boo (int k) >>> { >>> const int n = 10; >>> struct foo { >>> int m; >>> int a[n + m]; >>> }; >>> } >>> [opc@qinzhao~]$ gcc t6.c -S >>> t6.c: In function ‘boo’: >>> t6.c:6:15: error: ‘m’ undeclared (first use in this function) >>> 6 | int a[n + m]; >>> | ^ >>> >>> Explanation: C does not have the concept of instance scope (member scope), >>> there is no syntax provided to access the instance scope (member scope) >>> variables inside the structures. Therefore, the reference to the member >>> variable >>> "m" inside the declaration of the array member field a[n + m] is not >>> visible. >>> >>> 4. What's the possible approaches for the counted_by attribute as a C >>> extension. >>> >>> The major thing for this extension is: >>> Adding a new language feature in C to access the member variables inside a >>> structure. >>> >>> Based on the previous comparison between C and C++, there are two possible >>> approaches: >>> >>> A. Add a new variable scope: instance scope (member scope) into C >>> >>> The definition of the new instance scope of C is: >>> >>> The instance scope, also called member scope, refers to the region inside a >>> structure. >>> The variables, i.e, the members, declared here are accessible to the whole >>> structure. >>> They can be accessed by the object (i.e., the instance) of the structure. >>> >>> The {} brace that marks the boundary of a structure will change the >>> variable scope >>> to "instance scope"; a variable name confliction between other scopes >>> (including global/local) and instance scope will give precedence to >>> instance scope. >>> >>> The compiler's implementation on this approach could be: >>> ** a new variable scope, "instance scope" is added into C FE; >>> ** the "instance scope" has the higher precedence than the current >>> global/local scope; >>> ** the {} brace for the boundary of a structure is the boundary for the >>> "instance scope"; >>> ** a member variable that is referenced inside this structure could be >>> treated as this->member. >>> ** reference to a global variable inside the structure need a new syntax. >>> >>> B. Add a new syntax to access instance scope (member scope) variable within >>> the structure while keeping C's default scoping rules. >>> >>> The {} brace that marks the boundary of a structure will NOT change the >>> variable >>> scope. There are still only two variable scoping, global and local. >>> >>> In order to explicitly access a member inside a structure, a new syntax >>> need to >>> be added. This new syntax could reuse the current designator syntax in C >>> (prefixing the member variable with "."), or adding a new keyword similar >>> as "this”, >>> such as, "__self", and prefixing the member variable with “__self." >>> >>> With the above approach A, we can keep the current syntax for counted_by; >>> but not sure how easy to extend it for simple expression and nested >>> structure. >>> >>> However, the major problem with this approach is: it changes the default >>> scoping >>> rule in C languages. this additional variable scoping will break existing >>> legal C code: >>> >>> [opc@qinzhao~]$ cat t7.c >>> void boo (int k) >>> { >>> const int n = 10; // a local variable n >>> struct foo { >>> int n; // a member variable n >>> int a[n + 10]; // currently, this n refers to the local variable n. >>> }; >>> } >>> >>> When we take the approach A, within the structure "foo", the VLA a[n+10] >>> will refer to the member variable n, but not the local variable n anymore. >>> The existing code with VLA might work incorrectly. >>> >>> You can argue to only add the new variable scope for counted_by attribute, >>> not for VLA, then how to handle the following case: >>> >>> [opc@qinzhao~]$ cat t8.c >>> void boo (int k) >>> { >>> const int n = 10; // a local variable n >>> struct foo { >>> int n; // a member variable n >>> int a[n + 10]; // for VLA, this n refers to the local variable n. >>> char *b __attribute__ ((counted_by(n + 10))) >>> // for counted_by, this n refers to the member variable n. >>> }; >>> } >>> >>> This will be a disaster. >>> >>> So, I think that the approach A is not the right direction for a C >>> extension. >>> >>> With the above approach B, a new syntax need to be implemented, >>> and all the previous source code change in the application need to be >>> modified. >>> >>> But I still think that approach B is the right direction to go. >>> (Please refer to: >>> ******Scope of variables in C++ >>> https://www.geeksforgeeks.org/scope-of-variables-in-c/ >>> ******Scope of variables in C >>> https://www.geeksforgeeks.org/scope-rules-in-c/) >>> >>> >>> Appendix B: An example in linux kernel that the global cannot be "const" >>> qualified >>> >>> In linux kernel, the globals that will be referred inside counted_by >>> attribute don’t >>> change value, but they cannot be marked "const" since they are initialized >>> during >>> very early kernel boot. >>> >>> they _become_ architecturally read-only. i.e. they are in a memory region >>> that >>> is flipped to read-only after boot is finished.