Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Martin Uecker Mon, 02 Dec 2024 13:13:48 -0800

Am Montag, dem 02.12.2024 um 20:15 +0000 schrieb Qing Zhao:
> 
> > On Nov 30, 2024, at 07:22, Martin Uecker <uec...@tugraz.at> wrote:
> > 
> > Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao:
> > > Think this over these days, I have another thought that need some 
> > > feedback:
> > > 
> > > The major issue right now is:
> > > 
> > > 1. For the following structure in which the “counted_by” attributes is 
> > > attached to the pointer field.
> > > 
> > > struct foo {
> > >  int n;
> > >  char *p __attribute__ ((counted_by (n)));
> > > } *x;
> > 
> > BTW: I also do not like the syntax 'n' for a lookup of a member 
> > in the member namespace. I think it should be '.n'.  For FAM this 
> > is less problematic because it always at the end, but here it is 
> > problematic.
> During my implementation of extending this attribute to pointer fields of 
> structure. 
> I didn’t notice  issue with the current syntax ’n’ for the pointer fields so 
> far, 
> even though when  the field “n” is declared after the corresponding pointer 
> field, i.e:
> 
> struct foo {
> {
>   char *p __attribute__ ((counted_by (n)));
>   int n;
> }
> 
> So, could you please explain a little bit more on what’s the potential issue 
> here?


My issue with it is that it is not consistent to how C's scoping
of identifiers work. In

constexpr int n = 3;
struct foo {
{
  char (*p)[n] __attribute__ ((counted_by (n))
  int n;
}

the n in the type of p refers to the previous n in scope while
the n in your attribute would refer to the member.

This is incoherent and confusing.  It becomes worse should
you ever want to allow more complicated expressions.

It would be clearer if you the syntax ".n" which resembles
the syntax for designated initializers that is already used
in initializers to refer to struct members.

constexpr int n;
struct foo {
{
  char (*p)[n] __attribute__ ((counted_by (.n))
  int n;
}


> 
> 
> > > 
> > > There is one important additional requirement:
> > > 
> > > x->n, x->p can ONLY be changed by changing the whole structure at the 
> > > same time. 
> > > Otherwise, x->n might not be consistent with x->p.
> > 
> > By itself, this would still not fix the issue I pointed out.
> > 
> > struct foo x;
> > x = .. ; // set the whole structure
> > char *p = x->p;
> > x = ... ; // set the whole structure
> > 
> > What is the bound for 'p' ?  
> 
> Since p was set to the pointer field of the old structure, then the bound of 
> it should be the old bound. 
> > With current rules it would be the old bound.
> 
> I thought that this should be the correct behavior, isn’t it?

Yes, sorry, what I meant was "with the current rules it would be
the *new* bound".  (And I guess this is why you suggest to potentially
change it below).

Martin

> 
> > 
> > 
> > > 
> > > 2. This new requirement is ONLY for “counted_by” attribute that is 
> > > attached to the pointer field, not needed
> > > for flexible array members.
> > > 
> > > 3. Then there will be inconsistency for the “counted_by” attribute 
> > > between FAM and pointer field.
> > 
> > 
> > > The major questions I have right now:
> > > 
> > > 1. Shall we keep this inconsistency between FAM and pointer field? 
> > > Or, 
> > > 
> > > 2. Shall we keep them consistent by adding this new requirement for the 
> > > previous FAM? 
> > 
> > Or have a new attribute?  I feel we double down on a bad design
> > which made sense for FAM addressing a very specific use case
> > (retrofitting checks to the Linux kernel) but is otherwise not
> > very strong.
> 
> I do agree that the previous specific feature of the “counted_by" for the 
> FAM, i.e:
> "
>     One important feature of the attribute is, a reference to the
>     flexible array member field uses the latest value assigned to the
>     field that represents the number of the elements before that
>     reference.  For example,
> 
>            p->count = val1;
>            p->array[20] = 0;  // ref1 to p->array
>            p->count = val2;
>            p->array[30] = 0;  // ref2 to p->array
> 
>     in the above, 'ref1' uses 'val1' as the number of the elements in
>     'p->array', and 'ref2' uses 'val2' as the number of elements in
>     'p->array’.
> “
> 
> Is a specific feature for Linux kernel, I am wondering whether the above 
> feature
> really needed by the Linux kernel? 
> 
> If Not, I prefer to eliminate this specific feature from GCC before we 
> officially release the “counted_by”
> attribute in GCC15.
> 
> If Linux kernel does need this feature, Yes, maybe a new attribute for 
> pointer is better.
> 
> Kees, could you please also provide more comments and suggestion on this?
> 



> 
> > 
> > Martin
> > 
> > > 
> > > Kees, the following feature that you requested for the FAM:
> > > 
> > > "
> > >     One important feature of the attribute is, a reference to the
> > >     flexible array member field uses the latest value assigned to the
> > >     field that represents the number of the elements before that
> > >     reference.  For example,
> > > 
> > >            p->count = val1;
> > >            p->array[20] = 0;  // ref1 to p->array
> > >            p->count = val2;
> > >            p->array[30] = 0;  // ref2 to p->array
> > > 
> > >     in the above, 'ref1' uses 'val1' as the number of the elements in
> > >     'p->array', and 'ref2' uses 'val2' as the number of elements in
> > >     'p->array’.
> > > "
> > > Has this feature been used by Linux kernel already?
> > > Is this feature really needed by Linux kernel?
> > > 
> > > Thanks a lot for suggestions and comments.
> > > 
> > > Qing
> > > 
> > > > On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote:
> > > > 
> > > > Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao:
> > > > > Hi, Martin,
> > > > > 
> > > > > Thanks a lot for pointing this out. 
> > > > > 
> > > > > This does look like a problem we need avoid for the pointer arrays.
> > > > > 
> > > > > Does  the same problem exist in the language extension too if the n 
> > > > > is allowed to be changed after initialization?
> > > > > 
> > > > > If so, for the future language extension, is there any proposed 
> > > > > solution to this problem? 
> > > > > 
> > > > 
> > > > There is no specification yet and nothing formally proposed, so
> > > > it is entirely unclear at this point.
> > > > 
> > > > My idea would be to give 'x->buf' the type '(char(*)[x->n])'
> > > > where 'x->n' is loaded at the same time 'x->buf' is accessed
> > > > and then the value is frozen (like in the simpler versions
> > > > of 'counted_by' you had implemented first).  Of course, one
> > > > would then have to set x->n before setting the buffer (or
> > > > at the same time). This could be ensured by making the
> > > > member 'n' const, so that it can only be changed by
> > > > overwriting the whole struct. But I am still thinking
> > > > about this.
> > > > 
> > > > In any case, I think for "counted_by" this is not an option
> > > > because it would be confusing if it works differently
> > > > than for the FAM case. 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > > Qing
> > > > > > On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> wrote:
> > > > > > 
> > > > > > Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao:
> > > > > > > 
> > > > > > > > On Nov 19, 2024, at 10:47, Marek Polacek <pola...@redhat.com> 
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > > > > > > > Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao:
> > > > > > > > > > Hi,
> > > > > > > 
> > > > > > ..
> > > > > > 
> > > > > > Hi Qing,
> > > > > > 
> > > > > > > Per our discussion so far, if treating the following
> > > > > > > 
> > > > > > > struct foo {
> > > > > > > int n;
> > > > > > > char *p __attribute__ ((counted_by (n)));
> > > > > > > };
> > > > > > > 
> > > > > > > as an array with upper-bounds being “n” is reasonable.
> > > > > > 
> > > > > > There is one issue I want to point out, which I just realized during
> > > > > > a discussion in WG14.  For "counted_by" we defined the semantics
> > > > > > in a way that later store to 'n' will be taken into account. 
> > > > > > We did this to support the use case where 'n' is set after
> > > > > > the access to 'p'.
> > > > > > 
> > > > > > struct foo *x = ;
> > > > > > 
> > > > > > char *q = x->p;
> > > > > > x->n = 100; // this should apply
> > > > > > 
> > > > > > 
> > > > > > For FAMs this is fine, because it is a fixed part
> > > > > > of the struct.  But for the new pointer case, I think this is
> > > > > > problematic.  Consider this example:
> > > > > > 
> > > > > > struct foo *x = allocate_buffer(100);
> > > > > > 
> > > > > > where x->n is set to the right value in the allocation function.
> > > > > > 
> > > > > > Now let's continue with
> > > > > > 
> > > > > > char *q = x->p;
> > > > > > 
> > > > > > x = allocate_buffer(50);
> > > > > > // x->n is set to 50.
> > > > > > 
> > > > > > Now q refers to the old buffer, but x->n to the size of the new
> > > > > > buffer.  That does not seem right and scares me a little bit.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Martin
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Then, it’s reasonable to extend -fsanitize=bounds to instrument 
> > > > > > > the corresponding reference for the pointer with
> > > > > > > Counted-by attribute. 
> > > > > > > 
> > > > > > > What do you think?
> > > > > > > 
> > > > > > > Qing
> > > > > > > 
> > > > > > > > 
> > > > > > > > > I think the question is what -fsanitize=bounds is meant to be.
> > > > > > > > > 
> > > > > > > > > I am a bit frustrated about the sanitizer.  On the
> > > > > > > > > one hand, it is not doing enough to get spatial memory
> > > > > > > > > safety even where this would be easily possible, on the
> > > > > > > > > other hand, is pedantic about things which are technically
> > > > > > > > > UB but not problematic and then one is prevented from
> > > > > > > > > using it
> > > > > > > > > 
> > > > > > > > > When used in default mode, where execution continues, it
> > > > > > > > > also does not mix well with many warning, creates more code,
> > > > > > > > > and pulls in a libary dependency (and the library also depends
> > > > > > > > > on upstream choices / progress which seems a limitation for
> > > > > > > > > extensions).
> > > > > > > > > 
> > > > > > > > > What IMHO would be ideal is a protection mode for spatial
> > > > > > > > > memory safety that simply adds traps (which then requires
> > > > > > > > > no library, has no issues with other warnings, and could
> > > > > > > > > evolve independently from clang) 
> > > > > > > > > 
> > > > > > > > > So shouldn't we just add a -fboundscheck (which would 
> > > > > > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with
> > > > > > > > > more checking) and make it really good? I think many people
> > > > > > > > > would be very happy about this.
> > > > > > > > 
> > > > > > > > That's a separate concern.  We already have the -fbounds-check 
> > > > > > > > option,
> > > > > > > > currently only used in Fortran (and D?), so perhaps we could 
> > > > > > > > make
> > > > > > > > that option a shorthand for -fsanitize=bounds 
> > > > > > > > -fsanitize-trap=bounds.
> > > > > > > > 
> > > > > > > > Marek
> 
>

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Reply via email to