+ Aaron

> On Mar 6, 2025, at 4:27 PM, Yeoul Na <yeoul...@apple.com> wrote:
> 
> Hi Qing,
> 
> Thanks for writing up the RFC and keeping us in the loop. Are you planning to 
> add “__self.” to GCC's C++ compiler as well in the future? The problem we 
> have with “__self” being a default way of annotating bounds is that C++ 
> compatibility because bounds annotations are supposed to work in headers 
> shared between C and C++ and C++ should be able to parse it to secure the 
> boundary between the two languages. Another problem is the usability. The 
> user will have to write more code “__self.” all the time in the most common 
> use cases, which would be a huge regression for the usability of the language.
> 
> We are planning to write up alternative proposal without having to introduce 
> a new syntax to the C standard. We’ll discuss how we address problems raised 
> here. Please see my inlined comments.
> 
> Best,
> Yeoul
> 
> 
>> On Mar 6, 2025, at 2:03 PM, Yeoul Na <yeoul...@apple.com> wrote:
>> 
>> + John & Félix & Patryk & Henrik
>> 
>>> On Mar 6, 2025, at 1:44 PM, Qing Zhao <qing.z...@oracle.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Since I sent the patch series for “extend counted_by attribute to pointer 
>>> fields of structure” two months ago, a lot of discussion were invoked both 
>>> in
>>> GCC community and CLANG community:
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
>>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/131?u=gwelymernans
>>> 
>>> After reading all these discussions, understanding, studying, more 
>>> discussions,  
>>> and finally making the whole picture clearer, we came up with a proposal to 
>>> change
>>> the current design and add a new syntax for the argument of counted_by 
>>> attribute. 
>>> 
>>> The original idea of the new syntax was from Joseph, Michael and Martin, 
>>> Bill and Kees
>>> involved in the whole process of the proposal, providing a lot of 
>>> suggestions and
>>> comments. Really appreciate for the help from all of them. 
>>> 
>>> In this thread, I am also CC’ing several people from Apple who worked on 
>>> the -fbounds-safety
>>> project on CLANG side: yeoul...@apple.com <mailto:yeoul...@apple.com>, 
>>> d_tard...@apple.com <mailto:d_tard...@apple.com>, dl...@apple.com 
>>> <mailto:dl...@apple.com>,
>>> and dcough...@apple.com <mailto:dcough...@apple.com>.  
>>> 
>>> Please take a look at the proposal in below.
>>> 
>>> Let me know if you have any comments and suggestions.
>>> 
>>> Thanks.
>>> 
>>> Qing.
>>> 
>>> =========================================
>>> 
>>> New syntax for the argument of counted_by attribute
>>> --An extension to C language  
>>> 
>>> Outline
>>> 
>>> 0. A simple summary of the proposal
>>> 
>>> 1. The motivation
>>> 1.1 The current syntax of the counted_by argument might break existing 
>>> legal C code
>>> 1.2 New requests from the users of the counted_by attribute
>>> 1.2.1 Refer to a field in the nested structure
>>> 1.2.2 Refer to globals or locals
>>> 1.2.3 Represent simple expression
>>> 1.2.4 Forward referencing
>>> 
>>> 2. The requirement
>>> 
>>> 3. The proposed new syntax
>>> 3.1 Legal C code with VLA works correctly when mixing with counted_by
>>> 3.2 Satisfy all the new requests
>>> 3.2.1  Refer to a field in the nested structure
>>> 3.2.2 Refer to globals or locals
>>> 3.2.3 Represent simple expression
>>> 3.3 How to resolve the forward reference issue in section 1.2.4?
>>> 
>>> Appendix A: Scope of variables in C and C++
>>>    --The hints to the design of counted_by in C
>>> Appendix B: An example in linux kernel that the global cannot be "const" 
>>> qualified
>>> 
>>> 
>>> 0. A simple summary of the proposal
>>> 
>>> We propose a new syntax to the argument of the counted_by attribute:  
>>> * Introduce a new keyword, __self, to represent the new concept, 
>>> "the current object" of the nearest non-anonymous enclosing structure, 
>>> which allows the object of the structure to refer to its own member inside
>>> the structure definition.  
>>> 
>>> * With the new keyword, __self, the member variable can be referenced 
>>>  by appending the member access operator "." to "__self", such as, 
>>>  __self.member.
>>> 
>>> * This new keyword is invalid except in the bounds checking attributes, 
>>>  such as "counted_by", etc., inside a structure definition.
>>> 
>>> * Simple expression is enabled by this new keyword inside the attribute 
>>>  counted_by with the following limitation:
>>> A. no side-effect is allowed;
>>> and
>>> B. the operators of the expression are simple arithmetic operators, and the 
>>>   operands could be one of:
>>> B.1 __self.member or __self.member1.member2...(for nested structure);
>>> B.2 constant;
>>> B.3 locals that will not be changed after initialization;
>>> B.4 globals that will not be changed after initialization;
>>> 
>>> 
>>> 1. The motivation  
>>> 
>>> There are two major motivations for this new syntax.  
>>> 
>>> 1.1 The current syntax of the counted_by argument might break existing 
>>> legal C code
>>> 
>>> The counted_by attribute is currently defined as:  
>>> (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute)
>>> 
>>> counted_by (count)
>>> The counted_by attribute may be attached to the C99 flexible array member 
>>> of a structure. It indicates that the number of the elements of the array 
>>> is 
>>> given by the field "count" in the same structure as the flexible array 
>>> member.
>>> 
>>> For example:
>>> 
>>> int count;
>>> struct X {
>>> int count;
>>> char array[] __attribute__ ((counted_by (count)));  
>>> };
>>> 
>>> In the above, the argument of the attribute "count" is an identifier that 
>>> will be 
>>> looked up in the scope of the enclosing structure "X". Due to this new 
>>> scope 
>>> of variable, the identifier "count" refers to the member variable "count" 
>>> of this 
>>> structure but not the global variable defined outside of the structure.   
>>> 
>>> This is a new scope of variable that is added to the C language. In C, the 
>>> default available scopes of variable include only two scopes, global scope 
>>> and local scope.  
>>> 
>>> The global scope refers to the region outside any function or block. 
>>> The variables declared here are accessible throughout the entire program.
>>> 
>>> The local scope refers to the region enclosed between the { } braces, which 
>>> represent the boundary of a function or a block inside functions. The 
>>> variables 
>>> declared within a function or a block are only accessible locally inside 
>>> that 
>>> function or that block and other blocks nested inside.
>>> 
>>> (Please see Appendix A for more details on scope of variables in C and C++ 
>>> and why the current design of counted_by attribute is a disaster to C)
>>> 
>>> Note, the { } brace that marks the boundary of a structure does not change
>>> the current scope of the variable with the default scoping rules in C.
>>> 
>>> As a result, in the above example, with C's default scoping rule, the 
>>> "count” 
>>> inside counted_by attribute _should_ refer to the global variable "count" 
>>> but 
>>> not the member variable in the enclosing structure.  
>>> 
>>> A more compelling example is shown below when mixing counted_by attribute 
>>> with C's Variable Length Array (VLA).
>>> 
>>> void boo (int k)
>>> {
>>> const int n = 10; // a local variable n
>>> struct foo {
>>>  int n;          // a member variable n
>>>  int a[n + 10];  // for VLA, this n refers to the local variable n.
>>>  char b[] __attribute__ ((counted_by(n)));
>>>    // for counted_by, this n refers to the member variable n.
>>> };
>>> }
>>> 
>>> This code is bad. The size expression "n+10" of the VLA "a" follows the 
>>> default 
>>> scoping rule of C, as a result, "n" refers to the local variable "n" that 
>>> is defined 
>>> outside of the structure "foo"; However, the argument "n" of the counted_by 
>>> attribute of the flexible array member b[] follows the new scoping rule, it 
>>> refers 
>>> to the member variable "n" inside this structure.   
>>> 
>>> It's clear that the current design of the counted_by argument introduced a 
>>> new 
>>> scoping rule into C, resulting an inconsistent scoping resolution situation 
>>> in 
>>> C language.
>>> 
>>> This is a design mistake, and should be fixed.
> 
> We will have a different proposal based on reporting diagnostics on the name 
> conflicts. We need to diagnose the name conflicts like above anyway because 
> in code like that almost always the struct contains a buffer and its size as 
> the fields. Given that the program’s intention would be more likely to pick 
> up the member `n`, instead of some random global happened to be with the same 
> name in the same translation unit. Therefore, we should diagnose such cases 
> to avoid mistakes and avoid the program silently working with an unintended 
> way with the user mistake. Also, this program will have a different meaning 
> in C++, so that’s another reason to always diagnose with such ambiguity. 
> Also, the bounds annotation user might have just forgotten to add “__self.” 
> because it’s so intuitive to use the member name inside the attributes (I 
> know what’s “intuitive" depends on people’s background, but that’s what we 
> observed from massive adoption experience within Apple). This leaves the 
> feature error-prone, because the most intuitive syntax for bounds annotations 
> will be compiled into a different meaning (using the global as the size 
> instead of the peer member). So we should really diagnose it even if we add 
> “__self" to avoid the mistake.
> 
> Now, if we always diagnose it, then the lookup order doesn’t really matter 
> anymore. That means we will have an option to keep the current lookup rule of 
> C, and pick up the member name only when the global name is not available 
> (just one possible option). I see “__self.” being used as a suppression 
> mechanism if the programmer cannot change the name of the conflicting global 
> or member. But that doesn’t mean “__self” should be a default way of writing 
> the code. Suppression mechanisms are typically only used to suppress the 
> warnings and disambiguate. And this would mean we also need a way to 
> disambiguate it to mean global. C++ already has `::` but C doesn’t currently 
> have a scope qualifier but in order to use this new bounds safety feature, we 
> may need to invent something. Adding a new syntax is a risk so until we 
> standardize it I would suggest something like `__builtin_global_ref()`
> 

Another thing to note: Clang has already started adding such diagnostics to 
avoid the ambiguity. Here is PR from Aaron: 
https://github.com/llvm/llvm-project/pull/129772

> 
>>> 
>>> 1.2 New requests from the users of the counted_by attribute
>>> 
>>> The counted_by attribute for Flexible Array Member (FAM) has been adopted 
>>> in 
>>> Linux Kernel extensively. New requests came in in order to cover more 
>>> cases.  
>>> 
>>> 1.2.1 Refer to a field in the nested structure
>>> 
>>> This was requested from linux kernel.
>>> https://www.spinics.net/lists/linux-rdma/msg127560.html
>>> 
>>> A simplified testing case is:
>>> 
>>> struct Y {
>>> int n;
>>> int other;
>>> }
>>> 
>>> struct Z {
>>> struct Y y;
>>> int array[]  __attribute__ ((counted_by(?y.n)));
>>> };
>>> 
>>> in the above, what should be put instead of "?" to refer to the field "n" 
>>> of the 
>>> field "y" of the current object of this struct Z?
>>> 
>>> NOTE, we should completely reject the use cases that refer to a field in an 
>>> outer structure from an inner non-anonymous structure, such as:
>>> 
>>> struct A {
>>> int count;
>>> struct B {
>>> int other;
>>> int z[] __attribute__ ((counted_by(?)));
>>> } b;
>>> };
>>> 
>>> In the above, we should not allow the counted_by "?" of the FAM field "z" 
>>> of 
>>> the struct B to refer to the member variable "count" of the outer struct A. 
>>> Otherwise, when an object with the struct B is passed to a function, there 
>>> will be error when refer to the counted_by of its field "z".
>>> 
>>> However, the counted_by attribute of a field in the inner anonymous 
>>> structure 
>>> should be allowed to refer to a field of the outer structure. Since the 
>>> inner 
>>> anonymous structure can not be used independently of its enclosing 
>>> structure, 
>>> such as:  
>>> 
>>> struct A {
>>> int count;
>>> struct {
>>> int other;
>>> int z[] __attribute__ ((counted_by(count)));
>>> };
>>> } a;
>>> 
>>> In the above testing case, the counted_by attribute for the field "z" of 
>>> the inner 
>>> anonymous structure should be able to refer to the field of the outer 
>>> structure.
> 
> I couldn’t get the relation between the named nested struct and anonymous 
> struct here. Members of anonymous structure are essentially part of the outer 
> struct. And the members are already accessed the same as direct members of 
> the outer struct. It should work as below:
> 
> 
> struct A {
> int count;
> struct B {
> int other;
> int z[] __attribute__ ((counted_by(count))); // error: reference to undefined 
> identifier `count`. 
> } b;
> };
> 
> 
> struct A {
> int count;
> struct {
> int other;
> int z[] __attribute__ ((counted_by(count))); // works as members of anonymous 
> structure is part of structure A
> };
> } a;
> 
> 
> So I don’t see why this will prevent us from doing (counted_by(y.n)) without 
> needing any additional prefix.
> 
> 
>>> 
>>> 
>>> 1.2.2 Refer to globals or locals  
>>> 
>>> One request from linux kernel is here:
>>> https://lore.kernel.org/all/202309221128.6AC35E3@keescook/
>>> 
>>> A simple example is:
>>> 
>>> int count;// global variable
>>> struct X {
>>> int count; // member variable
>>> char array[] __attribute__ ((counted_by(??count)));      
>>>   //  How to refer to the global variable "count"      
>>>   //  but not the member variable "count" of the struct X?
>>> }
>>> 
>>> when the counted_by attribute tries to refer to the global variable "count” 
>>> outside
>>> the structure, how to distinguish it with its member variable "count"?
> 
> Again, this should be diagnosed and the programmer either needs to change the 
> name or use a suppression mechanism. As I suggested earlier we can introduce 
> something like __builtin_global_ref(), until we get a blessing from the C 
> committee to add a scope qualifier syntax in C.
> 
>>> 
>>> NOTE, Users need to make sure that the global or local variables should not 
>>> be 
>>> changed after they are initialized; otherwise, the results of the array 
>>> bound 
>>> sanitizer and the __builtin_dynamic_object_size is undefined.
>>> 
>>> Theoretically, We should limit the globals and locals ONLY to const 
>>> qualified 
>>> globals and locals to avoid abusing of this feature in the future. However, 
>>> due 
>>> to the existing code in linux kernel cannot be easily changed with const 
>>> qualifier. 
>>> We have to relax the limitation. See Appendix B for such an example in 
>>> linux kernel.  
>>> 
>>> In the future language extension, We should limit the globals and locals 
>>> ONLY 
>>> to const qualified globals and locals.
>>> 
>>> 1.2.3 Represent simple expression
>>> 
>>> This was requested multiple times from Linux kernel. One of the requests is:
>>> https://lore.kernel.org/lkml/20210727205855.411487-63-keesc...@chromium.org/
>>> 
>>> For example:
>>> 
>>> int elm_size;
>>> struct X {
>>> int count;
>>> char array[] __attribute__ ((counted_by(?count * elm_size)));
>>> }
>>> 
>>> in the above, what should be put instead of "?" to represent this simple 
>>> expression?
> 
> It should just work without any prefix because there’s no name conflict here, 
> it will be clear what each unqualified name is referring to.
> 
> constexpr int elm_size;
> struct X {
> int count;
> char array[] __attribute__ ((counted_by(count * elm_size)));
> }
> 
> I think this is not too different from this:
> 
> int elem_size;
> int foo(void) {
>  int count;
>  return count * elem_size;
> };
> 
> 
> 
>>> 
>>> NOTE, We should limit simple expressions to:
>>> 
>>> A. no side-effect is allowed,
>>> and
>>> B. the operators of the expression are simple arithmetic operators, and the 
>>> operands
>>>   could be one of the following:
>>> B.1 the member variable of the enclosing structure or inner structure of 
>>> the enclosing structure;
>>> B.2 constant;
>>> B.3 locals that will not be changed after initialization;
>>> B.4 globals that will not be changed after initialization;      
>>> 
>>> 1.2.4 Forward referencing
>>> 
>>> This request is only for counted_by attribute of pointers. Since the 
>>> flexible array 
>>> members(FAM) are always the last field of the containing structure, forward 
>>> reference issue does not exist for counted_by of FAM.  
>>> 
>>> How should we handle the situation when the counted_by attribute refers to
>>> a member variable that is declared after the pointer field in the structure?
>>> 
>>> For example:
>>> 
>>> struct bar {
>>> char *array __attribute__ ((counted_by(??count)));
>>> int count;  }
>>> 
>>> in the above, how can we refer to the field "count" that is declared after 
>>> the 
>>> pointer field "array" in the structure?
> 
> We should be able to refer to an undeclared field anyway even with “__self." 
> no? “__self.” doesn’t solve the problem that you should still be able to 
> forward reference a member.
> 
>>> 
>>> 2. The requirement:
>>> 
>>> This is an extension to C language, We should avoid adding a new scope of 
>>> variable (as the current syntax of the counted_by attribute for FAM) to 
>>> break 
>>> the existing legal C code. We should follow the default C language scoping 
>>> rules, keep the current valid C code working properly.
> 
> We have a way to not change the meaning of the existing code without 
> introducing a new syntax, but diagnosing already error-prone code that should 
> apply to both VLAs and bounds annotations. We are planning to write up a 
> proposal to the C standard soon.
> 
>>> 
>>> 3. The proposed new syntax:
>>> 
>>> * Keep the default C scoping rules.
>>> 
>>> * Introduce a new keyword, __self, to represent the new concept, "the 
>>> current object”
>>>  of the nearest non-anonymous enclosing structure, which allows the object 
>>> of the 
>>>  structure  to refer to its own member inside the structure definition. 
>>> This is similar
>>>  as the concept of "this" in C++, except that __self should be treated as a 
>>> special 
>>>  variable but not a pointer.  
>>> 
>>> * With the new keyword, __self, the member variable can be referenced by 
>>> appending
>>>  the member access operator "." to "__self", such as, __self.member. This 
>>> is similar 
>>>  as referring a member variable through a variable with the structure type 
>>> in the C 
>>>  language.   
>>> 
>>> * This new keyword is invalid except in the bounds checking attributes, 
>>> such as 
>>> "counted_by", etc.,  inside a structure definition.
>>> 
>>> * Simple expression is allowed inside the attribute counted_by with the 
>>> following limitation:
>>> 
>>> A. no side-effect is allowed,
>>> and
>>> B. the operators of the expression are simple arithmetic operators, and the 
>>> operands 
>>>  could be one of:
>>> B.1 __self.member or __self.member1.member2...(for nested structure);
>>> B.2 constant;
>>> B.3 locals that will not be changed after initialization;
>>> B.4 globals that will not be changed after initialization;
>>> 
>>> With the new syntax, the problems 1.1, 1.2.1 and 1.2.2 and 1.2.3 can be 
>>> resolved 
>>> naturally as following:
>>> 
>>> 3.1 Legal C code with VLA works correctly when mixing with counted_by
>>> 
>>> The previously bad code mixing with VLA is now:
>>> 
>>> void boo (int k)
>>> {
>>> const int n = 10; // a local variable n
>>> struct foo {
>>>  int n;          // a member variable n
>>>  int a[n + 10];  // for VLA, this n refers to the local variable n.
>>>  char b[] __attribute__ ((counted_by(__self.n)));
>>>    // for counted_by, this __self.n refers to the member variable n.
>>> };
>>> }
>>> 
>>> Now, We keep the default C scoping rule and make the counted_by referring 
>>> to member variable in the same structure correctly without ambiguity.
>>> 
>>> 3.2 Satisfy all the new requests
>>> 
>>> With this new syntax, all the new requests in section 1.2 (except 1.2.4 
>>> Forward 
>>> referencing) are resolved naturally.
>>> 
>>> 3.2.1 Refer to a field in the nested structure
>>> 
>>> struct Y {
>>> int n;
>>> int other;
>>> }
>>> 
>>> struct Z {
>>> struct Y y;
>>> int *array  __attribute__ ((counted_by(__self.y.n)));
>>> };
>>> 
>>> 3.2.2 Refer to globals or locals  
>>> 
>>> int count;
>>> struct X {
>>> char others;
>>> char array[] __attribute__ ((counted_by(count)));
>>> }
>>> 
>>> Since the new syntax keeps the default scoping rule of C language, the 
>>> "count” 
>>> without any prefix inside the counted_by attribute refers to the current 
>>> visible 
>>> variable in the current scope, that is the global variable "count”.
>>> 
>>> 3.2.3 Represent simple expression
>>> 
>>> When we can distinguish globals/locals from the member variables with this 
>>> new syntax, simple expressions are represented naturally:
>>> 
>>> int elm_size;
>>> struct X {
>>> int count;
>>> int *array __attribute__ ((counted_by(__self.count * elm_size)));
>>> }
>>> 
>>> More complicated example:
>>> 
>>> struct foo {
>>> int n;
>>> float f;
>>> }
>>> 
>>> A.
>>> #define NETLINK_HEADER_BYTES 8
>>> struct bar1 {
>>> struct foo y[5][10];
>>> char *array __attribute__ ((counted_by(__self.y[1][3].n - 
>>> NETLINK_HEADER_BYTES)));
>>> }
>>> 
>>> B.  struct bar2 {
>>> int n;
>>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n)));
>>> };
>>> 
>>> C.
>>> struct bar3 {
>>> int n;
>>> char *array __attribute__ ((counted_by((struct foo){.n = 4 }.n + 
>>> __self.n)));
>>> };
>>> 
>>> 
>>> 3.3 How to resolve the forward reference issue in section 1.2.4?
>>> 
>>> The new syntax naturally resolved all the problems we listed in section 1.2 
>>> except the forward reference issue:
>>> 
>>> If the member variable that is referred inside the counted_by is declared 
>>> after 
>>> the pointer field with the counted_by attribute, such as:
>>> 
>>> struct bar {
>>> char *array __attribute__ ((counted_by(__self.count)));
>>> int count;  }
>>> 
>>> In the above code, when "__self.count" is referred, its declaration is not 
>>> available, 
>>> compiler doesn't know its type yet.  
>>> 
>>> If it is a regular global or a local variable, this is a source code error, 
>>> C FE reports 
>>> an error and aborts. User should fix this coding error by adding the 
>>> declaration 
>>> of the variable before its first reference in the source code.
>>> 
>>> Theoretically, in C, we should treat this as a source code error too.  
>>> However, due to existing cases in the application (i.e, Linux Kernel), in 
>>> order to 
>>> avoid the source code change which might be painful or impossible due to 
>>> existing ABI, can we accept such cases and handle it in compiler?   
>>> 
>>> I think this might be doable during the implementation of the counted_by 
>>> attribute
>>> in C FE:
>>> 
>>> A. when C FE parses the new keyword __self, the whole containing structure 
>>> has
>>> not yet been seen completely, as a result, the FE has to insert a 
>>> placeholder for 
>>> __self, and delay the real IR generation after the whole structure being 
>>> parsed. 
>>> So, a small late handling ONLY for this placeholder _cannot_ be avoided.  
>>> 
>>> B. Then during this late handling of the placeholder, the C FE already 
>>> parses the
>>> whole structure, the declaration of the field is known at that time, the 
>>> forward 
>>> reference issue can be resolved naturally.   
>>> 
>>> This can be illustrated in the following small example:
>>> 
>>> struct bar {
>>> char *array __attribute__ ((counted_by(__self.count)));      
>>>   /* We haven't encountered 'count' yet, so we assume it's something like
>>>     'size_t' for now when inserting the placeholder for "__self". */
>>> int count;
>>> };  /* At this point, we know everything about the struct, we can handle
>>>     the placeholder for "__self" and also go back and use 'int" for
>>>     the type to refer count */
>>> 
>>> 
>>> Appendix A: Scope of variables in C and C++  
>>> --The hints to the design of counted_by in C
>>> 
>>> Scope of a variable defines the region of the code in which this variable 
>>> can 
>>> be accessed and modified.  
>>> 
>>> 1. What's common on the scope of variables between C and C++?
>>> 
>>> **First, there are mainly two types of variable scopes:  
>>> 
>>> A. Global Scope
>>> The global scope refers to the region outside any function or block. The 
>>> variables declared here are accessible throughout the entire program 
>>> and are called Global Variables.
>>> 
>>> B. Local Scope
>>> The local scope refers to the region enclosed between the { } braces, 
>>> which represent the boundary of a function or a block inside functions. 
>>> The variables declared within a function or a block are only accessible 
>>> locally inside that function or that block and other blocks nested inside.  
>>> 
>>> NOTE 1: the {} brace that mark the boundary of a structure/class does 
>>> not change whether the current scope is global or local.
>>> 
>>> **Second, if two variables with same name are defined in different scopes, 
>>> one in local scope and the other in global scope, the precedence is given 
>>> to the local variable:
>>> 
>>> [opc@qinzhao~]$ cat t1.c
>>> // Global variable
>>> int a = 5;
>>> int main() {
>>> // Local variable with same name as that of
>>> // global variable
>>> int a = 100;
>>> // Accessing a
>>> __builtin_printf ("a is %d\n", a);    return 0;
>>> }
>>> [opc@qinzhao~]$ gcc t1.c; ./a.out
>>> a is 100
>>> [opc@qinzhao~]$ g++ t1.c; ./a.out
>>> a is 100
>>> 
>>> 
>>> 2. What's different on the scope of variables between C and C++?
>>> 
>>> C++ has 3 additional variations of scopes:
>>> 
>>> A. Instance Scope (member scope):
>>> 
>>> The instance scope, also called member scope, refers to the region inside 
>>> a class/structure but outside any member function of the class/structure. 
>>> The variables, i.e, the data members, declared here are accessible to the 
>>> whole class/structure. They can be accessed by the object (i.e., the 
>>> instance) 
>>> of the class/structure.   
>>> 
>>> [opc@qinzhao~]$ cat t2.C
>>> struct foo {
>>> int bar1(void) { return m;  };      // m refers to the member variable
>>> int bar2(void) { int m = 20; return m;  };      // return m refers to the 
>>> local variable m = 20
>>> int bar3(void) { int m = 30; return this->m;  };      // this->m refers to 
>>> the member variable
>>> foo (int val) { m = val; };      // m refers to the member variable
>>> int m;      // Member variable with instance scope, accessible to the whole 
>>> structure/class
>>> };
>>> 
>>> int main ()
>>> {
>>> struct foo f(10);
>>> __builtin_printf (" bar1 is %d \n", f.bar1());
>>> __builtin_printf (" bar2 is %d \n", f.bar2());
>>> __builtin_printf (" bar3 is %d \n", f.bar3());
>>> return 0;
>>> }
>>> [opc@qinzhao~]$ g++ t2.C; ./a.out
>>> bar1 is 10   bar2 is 20   bar3 is 10  
>>> 
>>> Explanation: The member variable "m" is declared inside the structure "foo" 
>>> but 
>>> outside any member function of "foo", it has instance scope. This variable 
>>> is 
>>> visible to all the member functions of the structure "foo". when there is a 
>>> name 
>>> conflict with a local variable inside a member function, for example, 
>>> "bar2”, 
>>> the local variable has higher precedence. When trying to explicitly refer 
>>> to the 
>>> member variable in the member function, adding the C++ "this" pointer 
>>> before 
>>> it, for example, "bar3”.    
>>> 
>>> NOTE 2: the {} brace that marks the boundary of a structure/class changes 
>>> the
>>> variable scope to "instance scope" in C++.  
>>> 
>>> B. Static Member Scope
>>> 
>>> The static member scope refers to variables declared with the static 
>>> keyword 
>>> within the class/structure. These variables can be accessed using the class 
>>> name without creating the instance.
>>> 
>>> [opc@qinzhao~]$ cat t3.C
>>> struct foo {
>>> static int m; // Static member variable with static member scope,
>>> // accessible in whole structure/class
>>> };
>>> int foo::m = 10;
>>> int main ()
>>> {
>>> __builtin_printf (" foo::m is %d\n", foo::m);
>>> return 0;
>>> }
>>> [opc@qinzhao~]$ g++ t3.C; ./a.out
>>> foo::m is 10
>>> 
>>> NOTE 3: static member in structure is not available in C.   
>>> 
>>> C. Namespace Scope
>>> 
>>> A namespace in C++ is a container that allows users to create a separate 
>>> scope 
>>> where the given variables are defined. It is used to avoid name conflicts 
>>> and group 
>>> related code together. These variables can be accessed using their 
>>> namespace 
>>> name and scope resolution operator.
>>> 
>>> [opc@qinzhao~]$ cat t4.C
>>> namespace foo {
>>> int m = 10; // Namespace scope variable
>>> };
>>> int main ()
>>> {
>>> __builtin_printf (" foo::m is %d\n", foo::m);
>>> return 0;
>>> }
>>> [opc@qinzhao~]$ g++ t4.C; ./a.out
>>> foo::m is 10
>>> 
>>> NOTE 4: namespaces are not available in C language.  
>>> 
>>> 3. A simple summary comparing C to C++
>>> 
>>> A. there are only two variable scopes in C:
>>> 
>>> global scope
>>> local scope
>>> 
>>> all the other 3 variant variable scopes in C++,i.e., instance scope (member 
>>> scope), 
>>> static member scope, namespace scope,  are not available in C.  
>>> 
>>> Since there is no static member and namespace in C language, accessing to 
>>> static 
>>> member variables of a structure or variables declared in another namespace 
>>> is 
>>> not needed in C at all. 
>>> 
>>> NOTE 5: However, accessing the member of a structure inside the structure 
>>> is 
>>> needed for the purpose of counted_by extension in C.  
>>> 
>>> B. the {} brace that represents the boundary of the structure does not 
>>> change the 
>>> scope of the variable in C since C doesn't have instance scope (i.e.,member 
>>> scope);
>>> 
>>> The following examples can show these limitation in C language.
>>> 
>>> C currently support variable length array (VLA), whose array size could be 
>>> a 
>>> variable expression.  VLA is only supported in local scopes in C.
>>> 
>>> [opc@qinzhao~]$ cat t5.c
>>> void boo (int k)
>>> {
>>> const int n = 10;
>>> struct foo {
>>>  int m;
>>>  int a[n + k];
>>> };
>>> }
>>> [opc@qinzhao~]$ gcc t5.c -S
>>> 
>>> Explanation: This is good. The {} brace that marks the boundary of the 
>>> structure "foo” 
>>> does NOT change the scope of the variable n and k, their definitions reach 
>>> the 
>>> declaration of the array member field a[n + k].
>>> 
>>> However, when changing the testing case as:
>>> [opc@qinzhao~]$ cat t6.c
>>> void boo (int k)
>>> {
>>> const int n = 10;
>>> struct foo {
>>>  int m;
>>>  int a[n + m];
>>> };
>>> }
>>> [opc@qinzhao~]$ gcc t6.c -S
>>> t6.c: In function ‘boo’:
>>> t6.c:6:15: error: ‘m’ undeclared (first use in this function)
>>>  6 |     int a[n + m];
>>>    |               ^
>>> 
>>> Explanation: C does not have the concept of instance scope (member scope), 
>>> there is no syntax provided to access the instance scope (member scope) 
>>> variables inside the structures. Therefore, the reference to the member 
>>> variable 
>>> "m" inside the declaration of the array member field a[n + m] is not 
>>> visible.
>>> 
>>> 4. What's the possible approaches for the counted_by attribute as a C 
>>> extension.
>>> 
>>> The major thing for this extension is:  
>>> Adding a new language feature in C to access the member variables inside a 
>>> structure.
>>> 
>>> Based on the previous comparison between C and C++, there are two possible 
>>> approaches:
>>> 
>>> A. Add a new variable scope: instance scope (member scope) into C  
>>> 
>>> The definition of the new instance scope of C is:
>>> 
>>> The instance scope, also called member scope, refers to the region inside a 
>>> structure. 
>>> The variables, i.e, the members, declared here are accessible to the whole 
>>> structure. 
>>> They can be accessed by the object (i.e., the instance) of the structure. 
>>> 
>>> The {} brace that marks the boundary of a structure will change the 
>>> variable scope 
>>> to "instance scope"; a variable name confliction between other scopes 
>>> (including global/local) and instance scope will give precedence to 
>>> instance scope.  
>>> 
>>> The compiler's implementation on this approach could be:
>>> ** a new variable scope, "instance scope" is added into C FE;
>>> ** the "instance scope" has the higher precedence than the current 
>>> global/local scope;
>>> ** the {} brace for the boundary of a structure is the boundary for the 
>>> "instance scope";
>>> ** a member variable that is referenced inside this structure could be 
>>> treated as this->member.   
>>> ** reference to a global variable inside the structure need a new syntax.  
>>> 
>>> B. Add a new syntax to access instance scope (member scope) variable within
>>>   the structure while keeping C's default scoping rules.
>>> 
>>> The {} brace that marks the boundary of a structure will NOT change the 
>>> variable 
>>> scope. There are still only two variable scoping, global and local.  
>>> 
>>> In order to explicitly access a member inside a structure, a new syntax 
>>> need to 
>>> be added.  This new syntax could reuse the current designator syntax in C 
>>> (prefixing the member variable with "."), or adding a new keyword similar 
>>> as "this”, 
>>> such as, "__self", and prefixing the member variable with “__self."  
>>> 
>>> With the above approach A, we can keep the current syntax for counted_by;
>>> but not sure how easy to extend it for simple expression and nested 
>>> structure.
>>> 
>>> However, the major problem with this approach is: it changes the default 
>>> scoping 
>>> rule in C languages. this additional variable scoping will break existing 
>>> legal C code:
>>> 
>>> [opc@qinzhao~]$ cat t7.c
>>> void boo (int k)
>>> {
>>> const int n = 10; // a local variable n
>>> struct foo {
>>>  int n;     // a member variable n
>>>  int a[n + 10];  // currently, this n refers to the local variable n.
>>> };
>>> }
>>> 
>>> When we take the approach A, within the structure "foo", the VLA a[n+10] 
>>> will refer to the member variable n, but not the local variable n anymore. 
>>> The existing code with VLA might work incorrectly.
>>> 
>>> You can argue to only add the new variable scope for counted_by attribute,
>>> not for VLA, then how to handle the following case:
>>> 
>>> [opc@qinzhao~]$ cat t8.c
>>> void boo (int k)
>>> {
>>> const int n = 10; // a local variable n
>>> struct foo {
>>>  int n;     // a member variable n
>>>  int a[n + 10];  // for VLA, this n refers to the local variable n.
>>>  char *b __attribute__ ((counted_by(n + 10)))        
>>>    // for counted_by, this n refers to the member variable n.
>>> };
>>> }
>>> 
>>> This will be a disaster.  
>>> 
>>> So, I think that the approach A is not the right direction for a C 
>>> extension.
>>> 
>>> With the above approach B, a new syntax need to be implemented, 
>>> and all the previous source code change in the application need to be 
>>> modified.
>>> 
>>> But I still think that approach B is the right direction to go.  
>>> (Please refer to:
>>> ******Scope of variables in C++
>>> https://www.geeksforgeeks.org/scope-of-variables-in-c/
>>> ******Scope of variables in C
>>> https://www.geeksforgeeks.org/scope-rules-in-c/)
>>> 
>>> 
>>> Appendix B: An example in linux kernel that the global cannot be "const" 
>>> qualified
>>> 
>>> In linux kernel, the globals that will be referred inside counted_by 
>>> attribute don’t 
>>> change value, but they cannot be marked "const" since they are initialized 
>>> during 
>>> very early kernel boot.
>>> 
>>> they _become_ architecturally read-only. i.e. they are in a memory region 
>>> that 
>>> is flipped to read-only after boot is finished.

Reply via email to