Re: bounds checking / VLA types

Martin Uecker via Gcc Tue, 18 Mar 2025 00:17:48 -0700

Am Montag, dem 17.03.2025 um 16:45 -0700 schrieb Yeoul Na via Gcc:
> > 
> > > > On Mar 15, 2025, at 1:27 AM, Martin Uecker <uec...@tugraz.at> wrote:


...

> > > > 
> > > > > > > > Anyway, a lot of this changes if we want to use the same 
> > > > > > > > concept for
> > > > > > > > non-local pointers to arrays, because we no longer have an 
> > > > > > > > obvious point of
> > > > > > > > execution at which to evaluate the bounds expression. Instead, 
> > > > > > > > we are forced
> > > > > > > > into re-evaluating it every time we access the variable holding 
> > > > > > > > the array.
> > > > > > > > Consider:
> > > > > > > > struct X { int count; int (*ptr)[count * 10]; // using my 
> > > > > > > > preferred syntax }; void test(struct X *xp) { // For the 
> > > > > > > > purposes of the conversion check here, the // source type is
> > > > > > > > int (*)[<
> > > > > > > > xp->count * 10 >], freshly // evaluated as part of the member 
> > > > > > > > access. int (*local)[100] = xp->ptr; }
> > > > 
> > > > Yes, except for the syntax, this is how we plan to do it.
> > 
> > That would be great, but I’m curious how VLA types can be evolved
> > to support this safely without making it inconsistent with how local
> > pointers to arrays are currently handled.
> > Are you imagining something like below where the VLA types work
> > differently depending on the declaration the types are applied to?
> > Or you meant consistently fixing how the existing VLA types work?

The time of point where the size expression is evaluated needs
to be specified. For a size expression that contains .n this
can only need to be the time when the structure member of VM 
type is accessed.

> > 
> > 
> > struct X {
> >     int count;
> >     int (*ptr)[count];
> > };
> > 
> > void test(struct X *xp) {
> >     int len = 10;
> >     int (*ptr)[len] = malloc(sizeof(int) * len);
> > 
> >     xp->count = 100; // does dynamically change the type of xp->ptr.
> >     xp->ptr = malloc(sizeof(int) * 100); // hence, this works fine.

This yet has to be decided, but the following is a way
how this could be done.

When xp->ptr is accessed, the size expression
is evaluated and the lvalue which is formed at this point in
time gets the type int(*)[xp->count], similar to how the size
expression of function arguments are evaluated when the
function is entered. The type of this expression
then does not change anymore. You could implement it
by rewriting it to the following.

struct X {
 int count;
 int (*ptr)[];
};

#define ACCESS(xp) *( (int(**)[(xp)->count]) &(xp)->ptr)

xp->count = 100;
ACCESS(xp) = malloc(sizeof(int) * 100);

https://godbolt.org/z/r9WTb86xM

So this could be implemented in the compiler front-end
as builds on top of the existing framework for
variably-modified types.


To make it safe we would need to also enforce some additional
rules, e.g. that you have to (re-)assign the pointer after
setting the count. This could be implemented by adding new
warnings.

Martin



> > 
> >     len = 5; // does not change the type of ptr.
> >     ptr = malloc(sizeof(int) * len); // hence, this traps at run time. 
> > }
> > 


> > 
> > > > 
> > > > Based on my experience with other languages, I also think it is
> > > > useful to be able to refer to bounds outside of the struct
> > > > definition, e.g.
> > > > 
> > > > int foo(int n, struct foo { char (*ptr)[n]; } x);
> > > > 
> > > > struct foo { char (*ptr)[10]; } x = ...;
> > > > foo(10, x);
> > > > 
> > > > which GCC happens to support already:
> > > > 
> > > > https://godbolt.org/z/ja7qvMqPE
> > > > 
> > > > 
> > > > This becomes useful if you want to be able to express the
> > > > relation of different types, e.g. that two data structures
> > > > used as input for something point to buffers of the same length.
> > > > 
> > > > But the dependent sum type that we still miss is definitely
> > > > more important.
> > > > 
> > > > > > > > This has several immediate consequences.
> > > > > > > > Firstly, we need to already be able to compute the correct 
> > > > > > > > bound when we do
> > > > > > > > the dynamic checks for assignments into this field. For local 
> > > > > > > > variably-
> > > > > > > > modified types, everything in the expression was already in 
> > > > > > > > scope and
> > > > > > > > presumably initialized, so this wasn’t a problem. Here, we’re 
> > > > > > > > not helped
> > > > > > > > by scope, and we are dependent on the count field already 
> > > > > > > > having been
> > > > > > > > initialized.
> > > > > > > > Secondly, we must be very concerned about anything that could 
> > > > > > > > change the
> > > > > > > > result of this evaluation. So we cannot allow an arbitrary 
> > > > > > > > expression;
> > > > > > > > it must be something that we can fully analyze for what could 
> > > > > > > > change it.
> > > > > > > > And if refers to variables or fields (which it presumably 
> > > > > > > > always will), we
> > > > > > > > must prevent assignments to those, or at least validate that any
> > > > > > > > assignments aren’t causing unsound changes to the bound 
> > > > > > > > expression.
> > > > > > > > Thirdly, that concern must apply non-locally: if we allow the 
> > > > > > > > address of the
> > > > > > > > pointer field to be taken (which is totally fine in the local 
> > > > > > > > case!),
> > > > > > > > we can no directly reason about mutations through that pointer, 
> > > > > > > > so we
> > > > > > > > have to prevent changes to the bounds variables/fields while 
> > > > > > > > the pointer is
> > > > > > > > outstanding.
> > > > > > > > And finally, we must be able to recognize combinations of 
> > > > > > > > assignments,
> > > > > > > > because when we’re initializing (or completely rewriting) this 
> > > > > > > > structure,
> > > > > > > > we will need to able to assign to both count and ptr and not 
> > > > > > > > have the
> > > > > > > > same restrictions in place that we would for separate 
> > > > > > > > assignments.
> > > > 
> > > > Right, to make it safe you need to enforce such additional rules.
> > > > 
> > > > > > > > None of this falls out naturally from separate, local language 
> > > > > > > > rules; it
> > > > > > > > all has to be invented for the purpose of serving this dynamic 
> > > > > > > > check.
> > > > 
> > > > This is true, we see this already for "counted_by". Still, even
> > > > without enforcement of additional rules, you have some benefits:
> > > > 
> > > > - The bound is visibly connected to the pointer. Currently, it is
> > > > not and one has to guess.
> > > > - The access (or other operations) via the pointer can be 
> > > > bounds-checked,
> > > > assuming the bound is set correctly 
> > > > - The bound can be read via sizeof in other code, which prevents
> > > > errors.
> > > > - You can use it to build safer abstractions around it
> > > > (For example I am experimenting with a vector type and you can
> > > > access its content as an array that is bounds-checked in this way)
> > > > 
> > > > 
> > > > Then in some opt-in strict mode we could enforce additional
> > > > rules that make it perfectly safe.
> > > > 
> > > > 
> > > > > > > >  And in fact, -fbounds-safety has to do all of this already 
> > > > > > > > just to make
> > > > > > > > basic checks involving pointers in structs work.
> > > > 
> > > > I assumed it has to do similar checks anyway.
> > > > 
> > > > > > > > If that can all be established, though, I think the 
> > > > > > > > type-conversion-based
> > > > > > > > approach using variably-modified types has some very nice 
> > > > > > > > properties as a
> > > > > > > > complement to what we’re doing in -fbounds-safety.
> > > > 
> > > > I think we are in argreement here.
> > > > 
> > > > > > > > For one, it interacts with the -fbounds-safety analysis very 
> > > > > > > > cleanly. If
> > > > > > > > bounds in types are dynamically enforced (which is not true in 
> > > > > > > > normal C,
> > > > > > > > but could be in this dialect), then the type becomes a source 
> > > > > > > > for reliable
> > > > > > > > reliable information for the bounds-safety analysis.
> > > > > > > >  Conversely, if
> > > > > > > > a pointer is converted to a variably-modified type, the 
> > > > > > > > analysis done
> > > > > > > > by -bounds-safety could be used as an input to the conversion 
> > > > > > > > check.
> > > > > > > > For another, I think it may lead towards an cleaner story for 
> > > > > > > > arrays of
> > > > > > > > pointers to arrays than -fbounds-safety can achieve today, as 
> > > > > > > > long as
> > > > > > > > the inner arrays are of uniform length.
> > > > > > > > But ultimately, I think it’s still at best a complement to the 
> > > > > > > > attributes
> > > > > > > > we need for -fbounds-safety.
> > > > 
> > > > I agree we need to have the attributes, if just for annotating
> > > > legacy APIs where you can not change the types.
> > > > 
> > > > Martin
> > > > 
> > > > > > > > 
> > > > 
> > > > 
> >

Re: bounds checking / VLA types

Reply via email to