Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

John McCall Fri, 14 Mar 2025 21:22:24 -0700

On 14 Mar 2025, at 15:18, Martin Uecker wrote:

Am Freitag, dem 14.03.2025 um 14:42 -0400 schrieb John McCall:
On 14 Mar 2025, at 14:13, Martin Uecker wrote:
Am Freitag, dem 14.03.2025 um 10:11 -0700 schrieb David Tarditi:
Hi Martin,

The C design of VLAs misunderstood dependent typing.
They probably did not care about theory, but the design is
not inconsistent with theory.
This is almost true, but for bad reasons. The theory of dependenttypes is heavily concerned with deciding whether two types are thesame, and C simply sidesteps this question because type identity islargely meaningless in C. Every value of variably-modified type is(or decays to) a pointer, and all pointers in C freely convert to oneanother (within the object/function categories). _Generic is based ontype compatibility, not equality. So in that sense, the standarddoesn’t say anything inconsistent with theory because it doesn’teven try to say anything.
The reason it is not quite true is that C does have rules forcompatible and composite types, and alas, those rules forvariably-modified types are not consistent with theory. Two VLA typesof compatible element type are always statically consideredcompatible, and it’s simply UB if the sizes aren’t the same. Thecomposite type of a VLA and a fixed-size array type is always thefixed-size array type. The standard is literally incomplete about thecomposite type of two VLAs; if you use a ternary operator where bothoperands are casts to VLA types, the standard just says it’sstraight-up just undefined behavior (because one of the types has abound that’s unevaluated) and doesn’t even bother telling us whatthe static type is supposed to be.
Yes, I guess this is all true.

But let's rephrase my point a bit more precisely: One could take
a strict subset of C that includes variably modified types but
obviously has to forbid a lot other things (e.g. arbitrary pointer
conversions or unsafe down-casts and much more) and make this a
memory-safe language with dependent types. This would also
require adding run-time checks at certain places where there
is now UB, in particular where two VLA types need to be compatible.

Mmm. You can certainly subset C to the point that it’s memory-safe,butit wouldn’t really be anything like C anymore. As long as C has aheap,

I don’t see any path to achieving temporal safety without significant

extensions to the language. But if we’re just talking about spatialsafety,

then sure, that could be a lot closer to C today.

Is that your vision, then, that you’d like to see the same sort ofchecksthat `-fbounds-safety` does, but you want them based firmly in thelanguage

as a dynamic check triggered by pointer type conversion, with bounds

specified using variably-modified types? It’s a pretty elegant vision,andI can see the attraction. It has some real merits, which I’ll get tobelow.

I do see at least two significant challenges, though.

The first and biggest problem is that, in general, array bounds can onlybeexpressed on a pointer value if it’s got pointer to array type. Most Carraycode today works primarily with pointers to elements; programmers justusearray types to create concrete arrays, and they very rarely use pointersto

array type at all. There are a bunch of reasons for that:

- Pointers to arrays have to be dereferenced twice: `(*ptr)[idx]`instead

  of `ptr[idx]`.

- That makes them more error-prone, because it is easy to do pointer
  arithmetic at the wrong level, e.g. by writing `ptr[idx]`, which will
  stride by multiples of the entire array size. That may even pass the

compiler without complaint because of C’s laxness aboutconversions.

- Keeping the bound around in the pointer type is more work anddoesn’t do

  anything useful right now.

- A lot of C programmers dislike nested declarator syntax and can’trememberhow it works. Those of us who can write it off the top of our headsare

  quite atypical.

Now, there is an exception: you can write a parameter using an arraytype,and it actually declares a pointer parameter. You could imagine usingthis

as a syntax for an enforceable array bound for arguments, although the
committee did already decide that these bounds were meaningless without

`static`. Unfortunately, you can’t do this in any other position andstillend up with just a pointer, so it’s not helpful as a general syntaxfor

associating bounds with pointers.

The upshot is that this isn’t really something people can just adoptbyadding annotations. It’s not just a significant rewrite, it’s arewrite thatprogrammers will have very legitimate objections to. I think that makesthisat best a complement to the “sidecar” approach taken by`-fbounds-safety`

where we can track top-level bounds to a specific pointer value.

The second problem is that there are some extralingual problems that
`-fbounds-safety` has to solve around bounds that *aren’t* just local
evaluations of bounds expressions, and a type-conversion-driven approach
doesn’t help with any of them.

As you mentioned, the design of variably-modified types is based on
evaluating the bounds expression at some specific point in the program
execution. Since these types can only be written locally, the evaluation
point is obvious. If we wanted to dynamically enforce bounds during

initialization, it would simply be another use of the same computedbound:


```
int count = ...;
int (*ptr)[count * 10] = source_ptr;
```

Here we would evaluate `count * 10` exactly once and use it both as (1)partof the destination type when initializing `ptr` with `source_ptr` and(2)part of the type of `ptr` for all uses of it. For example, if`source_ptr`

were of type `int (*)[100]`, we would dynamically check that
`count * 10 <= 100`. This all works perfectly with an arbitrary bounds
expression; it could even contain an opaque function call.

Note that we don’t need any special behavior specifically for

initialization. If we later assign a new value into `ptr`, we will stillbeconverting the new value to the type `int (*)[< count * 10 >]`, usingthevalue computed at the time of declaration of the variable. This modelwouldsimply require that conversion to validate the bounds during assignmentjust

as it would during initialization.

Now, with nested arrays, variance does become a problem. Let’s reduce

bounds expression to their evaluated bounds to make this easier towrite.


- `int (*)[11]` can be converted to `int(*)[10]` because we’re simply
  allowing fewer elements to be used.
- By the same token, `int (*(*)[11])[5]` can be converted to

`int (*(*)[10])[5]`. This is the same logic as the above, just withan

  element type that happens to be a pointer to array type.

- But `int (*(*)[11])[5]` cannot be safely converted to `int(*(*)[11])[4]`,because while it’s safe to *read* an `int (*)[4]` from this array,it’s

  not safe to assign one into it.
- `int (* const (*)[11])[5]` can be safely converted to

`int (* const (*)[11])[4]`, but only if this dialect also enforcesconst-

  correctness, at least on array pointers.

Anyway, a lot of this changes if we want to use the same concept for

non-local pointers to arrays, because we no longer have an obvious pointofexecution at which to evaluate the bounds expression. Instead, we areforcedinto re-evaluating it every time we access the variable holding thearray.

Consider:

```
struct X {
  int count;
  int (*ptr)[count * 10]; // using my preferred syntax
};

void test(struct X *xp) {
  // For the purposes of the conversion check here, the
  // source type is int (*)[< xp->count * 10 >], freshly
  // evaluated as part of the member access.
  int (*local)[100] = xp->ptr;
}
```

This has several immediate consequences.

Firstly, we need to already be able to compute the correct bound when wedo

the dynamic checks for assignments into this field. For local variably-
modified types, everything in the expression was already in scope and

presumably initialized, so this wasn’t a problem. Here, we’re nothelped

by scope, and we are dependent on the `count` field already having been
initialized.

Secondly, we must be very concerned about anything that could change the
result of this evaluation. So we cannot allow an arbitrary expression;
it must be something that we can fully analyze for what could change it.

And if refers to variables or fields (which it presumably always will),we

must prevent assignments to those, or at least validate that any
assignments aren’t causing unsound changes to the bound expression.

Thirdly, that concern must apply non-locally: if we allow the address ofthe

pointer field to be taken (which is totally fine in the local case!),
we can no directly reason about mutations through that pointer, so we

have to prevent changes to the bounds variables/fields while the pointeris

outstanding.

And finally, we must be able to recognize *combinations* of assignments,

because when we’re initializing (or completely rewriting) thisstructure,we will need to able to assign to both `count` and `ptr` and not havethe

same restrictions in place that we would for separate assignments.

None of this falls out naturally from separate, local language rules; it

all has to be invented for the purpose of serving this dynamic check.And

in fact, `-fbounds-safety` has to do all of this already just to make
basic checks involving pointers in structs work.

If that can all be established, though, I think thetype-conversion-basedapproach using variably-modified types has some very nice properties asa

complement to what we’re doing in `-fbounds-safety`.

For one, it interacts with the `-fbounds-safety` analysis very cleanly.If

bounds in types are dynamically enforced (which is not true in normal C,

but could be in this dialect), then the type becomes a source forreliable

reliable information for the bounds-safety analysis. Conversely, if
a pointer is converted to a variably-modified type, the analysis done
by `-bounds-safety` could be used as an input to the conversion check.

For another, I think it may lead towards an cleaner story for arrays of
pointers to arrays than `-fbounds-safety` can achieve today, as long as
the inner arrays are of uniform length.

But ultimately, I think it’s still at best a complement to theattributes

we need for `-fbounds-safety`.

John.

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

Reply via email to