Re: bounds checking / VLA types

Bill Wendling via Gcc Tue, 25 Mar 2025 19:11:48 -0700

On Tue, Mar 18, 2025 at 3:04 PM Martin Uecker <uec...@tugraz.at> wrote:
>
> Am Dienstag, dem 18.03.2025 um 14:03 -0700 schrieb Yeoul Na via Gcc:
> >
> > > On Mar 18, 2025, at 12:48 PM, Martin Uecker <uec...@tugraz.at> wrote:
> > >
> > > Am Dienstag, dem 18.03.2025 um 09:52 -0700 schrieb Yeoul Na via Gcc:
> > > >
> > > > > On Mar 18, 2025, at 12:16 AM, Martin Uecker <uec...@tugraz.at> wrote:
> > > > >
> > > > > When xp->ptr is accessed, the size expression
> > > > > is evaluated and the lvalue which is formed at this point in
> > > > > time gets the type int(*)[xp->count], similar to how the size
> > > > > expression of function arguments are evaluated when the
> > > > > function is entered. The type of this expression
> > > > > then does not change anymore. You could implement it
> > > > > by rewriting it to the following.
> > > >
> > > > Yes, that’s effectively similar to how the size of bounds annotations 
> > > > work too.
> > > >
> > > > Though my question was, VLA types currently don't work this way.
> > > >
> > > > The size is evaluated when it’s declared as you all discussed earlier.
> > >
> > > this is the case only for automatic variables.  For function arguments
> > > they are evaluated at function entry.
> >
> > Well, I think saying this way sounds a bit misleading. This works this
> > way because function parameters are automatic variables of the function
> > and their declarations are reached at the entry of the function.
>
> There is no control flow which goes through the declarations
> of the arguments in the function prototype, which would define
> exactly when in what order these "declarations are reached",
> while this is clear for function-scope declarations which
> are actually reached by control flow.
>
> The C standard just explicitely states that on function
> entry those size expressions are evaluated.
>
> > And by “only for automatic variables’, then you mean global
> > and static variables are not or won't be working in the same
> > way as the automatic variables in the future?
>
> I should have said objects defined at function-scope.
> static variables already work like this.
>
> > For global and static variables, I meant “pointers" to VLA types.
> >
> > Also, what about a pointer to pointer to an array? Should it
> > be fixed to be evaluated at the time of use?
>
> According to the C standard, the size expression
> for the example below is evaluated on function entry.
>
> We try to not break existing code in C without a very
> good reason, so I do not think this can be changed
> (but it also seems just fine to me how it is).
>
> >
> > int foo(int len, int (*((*inout_buf)[len])) {
> >   // use *inout_buf
> >   // ...
> >
> >   int new_len = 10;
> >   *out_buf = malloc(new_len); // should it be fixed to be evaluated at the 
> > time of use?
> >
> >   return new_len;
> > }
> >
> >
> > >
> > > > So do you mean you want to fix the current behavior or make the
> > >
> > > I think the current behavior is fine, and changing it
> > > would also be problematic for existing code.
> > >
> > > > behavior inconsistent between if they are struct members,
> > > > indirect pointers vs. local variables like below?
> > >
> > > I think the nature of the problem requires that the size
> > > expression must be evaluated at different time points
> > > depending on the scenario.
> > >
> > > For automatic variables, this is when the declaration is
> > > reached in the control flow and for function arguments when
> > > the function is entered.  For structure members where the size
> > > expression refers to other members, they need to be
> > > evaluated when the member is accessed.
> > >
> > > >
> > > > void test(struct X *xp) {
> > > >    int len = 10;
> > > >    int (*ptr)[len] = malloc(sizeof(int) * len); // (1) int (*ptr)[len] 
> > > > is evaluated here and fixed.
> > > >
> > > >    xp->count = 100;
> > > >    xp->ptr = malloc(sizeof(int) * 100); // size of xp->ptr is 
> > > > dynamically evaluated with xp->count, hence, this works fine.
> > > >
> > > >    len = 5; // (2)
> > > >    ptr = malloc(sizeof(int) * len); // size of ptr is still fixed to be 
> > > > (1), so this traps at run time.
> > > > }
> > >
> > > Yes.
> >
> > I think the fact that the behavior diverges in this subtle
> > way will make things very confusing.
> >
> I use it in this way (using a macro for access) and think
> it works very nicely.
>
> > Not sure adding “dot” actually helps that much with the users
> > comprehending this code.
> >
> Based on my humble experience, I would say that it helps people
> to avoid mistakes when things that do behave differently do not
> look identically.
>
> > That’s why I was quite skeptical if VLA in C can ever be evolved
> > to model the size with a struct member safely, given that VLA made
> > a wrong choice from the beginning.
>
> I don't think a wrong choice was made.
>
> The alternative would be to have the type of a variable
> change whenever a variable in the size expression changes.
> I think this would be much harder to understand and cause
> more errors.
>
> For a structure members there is no alternative.  But it is
> also less problematic than for other variables, because we
> need to restrict the expressions which we allow in this
> context anyhow.  I also like things to be consistent, but
> different things also need to be treated differently.
>
>
> We could try to make things more restrictive, e.g. require
> these expressions to be "const" in some safety profile,
> which would make such differences go a way.
>
> Martin
>
>
The discussions seem to have died down. We need to arrive at a
consensus for what to do here. Otherwise, both GCC's and Clang's
implementations are dead in the water. There doesn't seem to be
agreement that the conflict that Qing pointed out is important enough
for us to revise the syntax. So what then are we to do?


>From what I could tell, the people at Apple are okay with adding a
"__self" (or similar) keyword but not requiring it. Instead, in the
absence of "__self", an unknown identifier is resolved to a struct /
union member. The use of "__self" should be used for situations that
may cause confusion (e.g. the VLA example). (Adding checks based on
VLA types would be cool, but isn't really what we're trying to achieve
in this email thread. :-)

On the other hand, some view the VLA example as a clear indicator that
we made a fatal error early on---i.e., we've added an entirely new
scoping rule to C. And while we have a clear example of how to handle
an "instance scope" because of C++, it hasn't been adopted by the
standards committee (yet), and it's possible for us to get something
wrong.

Then we have the situation that not requiring __self and only
resolving to struct / union members doesn't encompass all requested
features, e.g. use of (const or "non-const but we know this will never
change") global variables that the Linux people want.

It seems clear that using "__self" is most likely going to be part of
any solution we come up with. What we need to avoid is feature skew
between GCC and Clang. It'll revert projects back into the "compiler
war" days, which isn't beneficial to anyone. Is there a compromise
that's acceptable for both GCC and Clang beyond adding "__self", but
not requiring it?

-bw

> >  That said this is just my own impression and whether we can make the 
> > usable model this way may need more study.
> >
> > At least now I understand what is your vision. Thanks.
> >
> > >
> > > For GNU C (but not ISO C) there could be size expression in
> > > the declaration referring to automatic variables.
> > >
> > > int foo()
> > > {
> > >  int n = 1;
> > >  struct {
> > >   char buf[n]; // evaluated when the declaration is reached
> > >  } x;
> > > };
> > >
> > > This is something which can not currently happen for ISO C where
> > > this can only refer to named constants and where the size is
> > > evaluated at compile time.
> > >
> > > int foo()
> > > {
> > >  enum { N = 1 };
> > >  struct {
> > >    char buf[N];
> > >  } x;
> > > }
> > >
> > > Still, I think this is another reason why it would be much
> > > better to have new syntax for this.  We also need to restrict
> > > the size expressions in all three scenarios in different ways.
> > >
> > > int foo()
> > > {
> > >  int n = 1;
> > >  struct {
> > >
> > >    int m;
> > >    char buf1[n];    // evaluated when the declaration is reached
> > >    char buf2[.m];   // evaluated on access to buf2, restricted syntax
> > >  } x { .m = 10 };
> > > }
> > >
> > >
> > > Martin
> >
> > Yeoul
> >
>

Re: bounds checking / VLA types

Reply via email to