Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

David Tarditi Sat, 15 Mar 2025 09:20:10 -0700

I’ve been working on bound-safe extensions for C since 2016.  I did the Checked 
C work.

Here’s my perspective on the discussion. I think language features should make 
the common case easy to use, have concise syntax, and be easy to understand.  
The bounds safety extensions will be used by millions of developers and need to 
carry over to C++.  They are a big change to C. We should design the extensions 
with this in mind.

Bounds attributes do not have to follow the rules for VLAs; they are new and 
can be different if that provides a better design.  The simplest design for 
bounds attributes is to allow the use of the identifier that is closest in 
scope in the bounds attributes.  For parameters, this would be other parameters 
on the parameter list.  For members, this would be other members.  This is what 
Deputy, Checked C, and -f-bounds-safety do.  If this leads to differences in 
bounds attributes with existing VLA scoping behaviors, that’s fine. We can 
provide diagnostics for the differences.  My guess is that it will be rare 
because VLA usage is rare.

This design also follows the principle of “least surprise”.  If someone in a 
header file that I depend upon introduces a global variable `len`, as a 
programmer, I’ll be very surprised if my bounds attributes suddenly pick that 
up instead of the version of `’len` that is defined in the nearest scope.

There’s been a lot of discussion about shadowing.  It is an uncommon case for 
bounds safety. We ran into it on occasion in the research on Checked C but it 
was rare. Apple’s experience is that it is rare too.  Shadowing should be 
*avoided* in security sensitive code.  Most security/correctness coding 
standards ban shadowing or frown on it severely.  It leads to complex, 
bug-prone code. 

As John McCall pointed out, for function parameters, shadowing can be handled 
trivially in a local fashion by renaming parameters.  That leaves only the case 
of name overlap of a structure member and a global variable. A fine design spot 
would be to prefer the member name in the bounds attribute, but if someone 
really needs the global, have an operator for referring to it.

I don’t agree that the bounds-safety extension needs to align closely with the 
design for VLAs.  It’d be nice, but a good design should come first.  VLA’s 
design was wrong for bounds-safety to begin with. It seems wrong to let its 
usability mistakes, which led to the scoping mistakes we’re now debating, 
propagate into scoping mistakes for a new feature.

We considered using VLAs for Checked C back in 2016.  We found that you 
couldn’t use them to write bounds attributes for common C functions because the 
size parameters came second and there was no way to reference them (this is the 
source of one scoping problem) You also couldn’t use them to write a variable 
length buffer because there was no way to reference a member (this is the 
source of the 2nd scoping problem).

struct buf {
    Int len;
    Int data[len];
};

The VLA semantics are also problematic. User can side-effect variables used in 
a VLA declaration after the declaration, while the declaration is in scope.  It 
is not obvious how to implement bounds checking, because the bounds could be 
gone (invalid).  It is a design flaw that the declared bounds can become 
invalid while the declaration is in scope.

Finally, when you get to nested pointers, the syntax (and typing) for VLAs 
becomes confusing. This isn’t VLAs fault. It is there in the original C design. 
  Consider a function `f` that takes an array of singleton pointers as an 
argument `ptrarr` and does an update through one of the singleton pointers.   
`f` has a straightforward signature and implementation:

void f(int len, int **ptrarr, int index, int val) {
   if (len > index)
     *(ptrarr[index]) = val;
}

Now consider the VLA version of this. It turns into

void f(int len, int (*(ptrarr[len]))[1], int index, int val) {
  if (len > index)
    (*(ptrarr[index]))[0] = val;
}

Reading from inside out, ptrarr is “an array of length len” of values of type 
“pointer to an array of 1 integer element”.  

Note that an extra indirection is needed for the the use of ptrarr.  This 
converts the “pointer to an array” to an array type.  We can introduce a 
temporary variable `p` for the read from ptrarr that shows this type:

void f(int len, int (*(ptrarr[len]))[1], int index, int val) {
  if (len > index) {
    int (*p)[1] = ptrarr[index];
    (*p)[0] = val;
  }
}

To have a variable that handles an array passed as data, we need to use 
“pointer to array T”, because a variable of type “array T” declares a variable 
of array type.

To be clear, we’re not suggesting changes to scoping rules for existing C code, 
since that would break existing code.  We do want bounds attributes to follow 
different scoping rules. We think this is justified because (1) it will result 
in a better design for the common case and (2) the compiler can diagnose the 
case where identifiers are being used with different meaning in the same local 
scope.

> On Mar 13, 2025, at 11:48 AM, JeanHeyd Meneide <phdoftheho...@gmail.com> 
> wrote:
> 
> On Thu, Mar 13, 2025 Martin Uecker <uec...@tugraz.at 
> <mailto:uec...@tugraz.at>> wrote:
>> ...
>> 
>> So it seems to be a possible way forward while avoiding
>> language divergence and without introducing anything too novel
>> in either language.
>> 
>> (But others still have concerns about .n and prefer __self__.)
> 
>      I would like to gently push back about __self__, or __self, or self, 
> because all of these identifiers are fairly common identifiers in code. When 
> I writing the paper for __self_func ( 
> https://thephd.dev/_vendor/future_cxx/papers/C%20-%20__self_func.html ), I 
> searched GitHub and other source code indexing and repository services: 
> __self, __self__, and self has a substantial amount of uses. If there's an 
> alternative spelling to consider, I think that would be helpful.
> 
>     I would also like to offer that other people have approached me about 
> `::` as a way to help disambiguate identifiers and prevent local shadowing in 
> macros ( see: https://github.com/ThePhD/future_cxx/issues/69 ). However, I 
> don't think it helps with the case of this GCC extension:
> 
> int main () {
>     int n = 1; // a local variable n
>     struct foo {
>         int n;     // a member variable n
>         int a[n + 10];  // for VLA, this n refers to the local variable n.
>         //char *b __attribute__ ((counted_by(n + 10)))
>         // for counted_by, this n refers to the member variable n.
>     };
> }
> 
>       If you use `::n`, this allows you to reference a global variable. But 
> the contentious `n` here isn't a global variable, it's a local. So it's not 
> of much help here. If you stack another "n" at the global scope, you then 
> have another problem:
> 
> extern int n;
> int main () {
>     int n = 1; // a local variable n, shadows global
>     struct foo {
>         int n;     // a member variable n
>         int a[n + 10];  // for VLA, this n refers to the local variable n.
>         //char *b __attribute__ ((counted_by(n + 10)))
>         // for counted_by, this n refers to the member variable n.
>     };
> }
> 
> Now, even if you use C++-style `::n`, and then use the rules proposed by 
> context-sensitivity, it becomes impossible to refer to the local variable 
> outside of the struct without additional annotation. You get the opposite of 
> this problem with `${KEYWORD}.n` (${KEYWORD} as a placeholder for __self, 
> since I still have the above-named problems with __self): it enables 
> referring to the structure variable with ${KEYWORD}, and the local variable 
> with nothing, but still leaves the global variable as non-referenceable 
> anymore. Part of this problem is self-inflicted: VLAs in structures are a GNU 
> extension and not an ISO C feature (for reasons like this one). But it's 
> still technically a problem, and we can't necessarily step on GCC's 
> affordance to make an extension in this space, so whatever we come up with we 
> will have both problems to fix.
> 
>      I see 2 plausible ways forward, though I've only thought about this for 
> 4 days:
> 
>      (0) Accept that Yeoul (and the others) are correct in that issuing an 
> error (diagnostic) for this case would be better. Effectively, it's just bad 
> code and you ask the user to change the local variable from e.g. `n`, which 
> is something they should have control over (theoretically). Then, standardize 
> `::n` to refer to the global. The local variable could have a different name, 
> the name in the structure might be similar to a global (but is found by 
> counted_by's lookup), and the global variable has to be named explicitly with 
> `::n". This does not necessarily solve the forward reference problem, but all 
> solutions proposed here require delayed resolution (especially to deal with 
> the common struct case), so this seems like a moot point in-general.
> 
>      (1) Accept that we need ${KEYWORD}, or ${DOT} , to refer to locals. This 
> does not solve the problem where a local variable shadows a global variable, 
> so even if this path is taken I would still suggest `::n` to go with it, so 
> that we can solve the problem where a local variable shadows a global 
> variable. Then there's no new real "lookup rule", so people who feel like 
> we're violating C's core design space might feel less uneasy because you have 
> to use the new syntax (a keyword or `.`) to access in-struct things. This 
> still has a forward reference problem, so it's once again moot whether or not 
> the forward reference problem can be solved here.
> 
>      The (0) solution can be seen as more "natural"; there's no dots, no 
> keyword, but it requires a potential change in local variables for 
> conflicting cases. `::global` comes along for the ride as the way to separate 
> member fields from globals. I could see this working and, as I understand it, 
> this is the choice Clang was currently progressing with (?).
> 
>      The (1) solution can be seen as less "natural"; it requires extra syntax 
> to say what is, overwhelmingly, the common use case and ISO 
> standard-supported use case to make way for a pathological GNU extension in 
> VLA members. It becomes a bit more natural if you use {DOT}, rather than 
> {KEYWORD}, thanks to designated initializers being in both Standard, ISO C 
> and C++ now.
> 
>      An additional solution that has been proposed (but the author dropped 
> the proposal) is _Outer. The proposal was in the context of macros normally, 
> but it applies to this situation too ( 
> https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2679.pdf ). You could use 
> (0) + _Outer as a means of annotating the pathological case, and diagnose 
> (error) when a local variable plus a member field have the same name. This 
> would also get you over the finish line, without needing to change the name 
> of a local C variable as well. It would also not require you to add the 
> _Outer until you write problematic code.
> 
>       I'm sure that this is not helpful, as I'm just sort of stating a bunch 
> of different ways to solve the problem without really doing any complex 
> analysis. I think that VLA syntax inside of structures using local variables 
> to determine its size and not the member variable in its initial introduction 
> was a mistake that is currently having bad consequences for this discussion. 
> My preference for solutions is (0), then (1), but this is only a reflection 
> of personal expectation. It's also colored by having to also think about this 
> problem for __counted_by / bounds attributes for function parameters, which 
> is facing similar issues between choosing a parameter name vs. choosing a 
> global variable. I think `_Outer` would be helpful if neither (0) or (1) 
> finds traction, as an agreeable middleground that has other applications.
> 
>      I think that, at least for array syntax and attribute syntax, some 
> amount of delayed resolution (in structures and parameter lists) would be 
> both expedient and wise.
> 
> Sincerely,
> JeanHeyd

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to