On Mon, 25 Nov 2024, Martin Uecker wrote:

> 
> Hi Richard,
> 
> here is another version.  This now just ignores the size for all trailing
> arrays which I think is the right thing to do.  It also modifies the lto
> hashing which also seems to work (but needs more testing and I haven't
> added tests to the patch yet).
> 
> I also added back the missing comparison for the field offset.
> 
> Instead of checking for flexible arrays members, for the exceptions
> I now simply use RECORD_OR_UNION_TYPE_P.  The alternative would be to
> add the bit, but this would then be for any nested trailing array 
> - not FAM.  
> 
> But I am not entirely sure adding the bit is worth it, as checking
> for structures seems to be good enough - maybe for "verify_type"
> we could then spend more time and check recursively for a trailing
> array.  
> 
> Before I continue with this, what do you think?
> 
> Comments and tests still need some improvements.
> 
> Martin
> 
> 
> 
>     Fix type compatibility for types with flexible array member 
> [PR113688,PR114014,PR117724]
>     
>     verify_type checks the compatibility of TYPE_CANONICAL using
>     gimple_canonical_types_compatible_p.   But it is stricter than what the
>     C standard requires and therefor inconsistent with how TYPE_CANONICAL is 
> set
>     in the C FE.  Here, the logic is changed to ignore the array size when one
>     is the last element of a structure or union.  To not get errors because of
>     an inconsistent number of members, zero-sized arrays are not ignored 
> anymore
>     when checking the fields of a struct (which is stricter than what was done
>     before).  Finally, exceptions are added that allow the TYPE_MODE of a type
>     with an array as last member to differ from another compatible type.
>     
>             PR c/113688
>             PR c/114014
>             PR c/117724
>     
>     gcc/ChangeLog:
>             * tree.cc (gimple_canonical_types_compatible_p): Revise
>             logic for types with array as last member.
>             (verify_type): Add exceptions.
>     
>     gcc/lto/ChangeLog:
>             * lto-common.cc (hash_canonical_type): Add exceptions.
>     
>     gcc/testsuite/ChangeLog:
>             * gcc.dg/pr113688.c: New test.
>             * gcc.dg/pr114014.c: New test.
>             * gcc.dg/pr117724.c: New test.
> 
> diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
> index 86a309f92b4..f65a9d1c7b6 100644
> --- a/gcc/lto/lto-common.cc
> +++ b/gcc/lto/lto-common.cc
> @@ -254,7 +254,8 @@ hash_canonical_type (tree type)
>       checked.  */
>    code = tree_code_for_canonical_type_merging (TREE_CODE (type));
>    hstate.add_int (code);
> -  hstate.add_int (TYPE_MODE (type));
> +  if (!RECORD_OR_UNION_TYPE_P (type))
> +    hstate.add_int (TYPE_MODE (type));
>  
>    /* Incorporate common features of numerical types.  */
>    if (INTEGRAL_TYPE_P (type)
> @@ -332,7 +333,11 @@ hash_canonical_type (tree type)
>           && (! DECL_SIZE (f)
>               || ! integer_zerop (DECL_SIZE (f))))
>         {
> -         iterative_hash_canonical_type (TREE_TYPE (f), hstate);
> +         tree t = TREE_TYPE (f);
> +         if (!TREE_CHAIN (f)
> +             && TREE_CODE (t) == ARRAY_TYPE)
> +           t = TREE_TYPE  (t);

I think this is fine, but it warrants a comment on why we do this.

> +         iterative_hash_canonical_type (t, hstate);
>           nf++;
>         }
>  
> diff --git a/gcc/testsuite/gcc.dg/pr113688.c b/gcc/testsuite/gcc.dg/pr113688.c
> new file mode 100644
> index 00000000000..8dee8c86f1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr113688.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g" } */
> +
> +struct S{int x,y[1];}*a;
> +int main(void){
> +     struct S{int x,y[];};
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/pr114014.c b/gcc/testsuite/gcc.dg/pr114014.c
> new file mode 100644
> index 00000000000..ab783f4f85d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114014.c
> @@ -0,0 +1,14 @@
> +/* PR c/114014
> + * { dg-do compile }
> + * { dg-options "-std=c23 -g" } */
> +
> +struct r {
> +  int a;
> +  char b[];
> +};
> +struct r {
> +  int a;
> +  char b[0];
> +};   /* { dg-error "redefinition" } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/pr117724.c b/gcc/testsuite/gcc.dg/pr117724.c
> new file mode 100644
> index 00000000000..d631daeb644
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr117724.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g" } */
> +
> +struct {
> +  unsigned long len;
> +  unsigned long size;
> +  char data[];
> +};                   /* { dg-warning "unnamed struct" } */
> +struct {
> +  struct {
> +    unsigned long len;
> +    unsigned long size;
> +    char data[6];
> +  };
> +};                   /* { dg-warning "unnamed struct" } */
> +
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 125f38b1cfa..30e3b4531a1 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -13907,8 +13907,12 @@ gimple_canonical_types_compatible_p (const_tree t1, 
> const_tree t2,
>        || TREE_CODE (t1) == NULLPTR_TYPE)
>      return true;
>  
> -  /* Can't be the same type if they have different mode.  */
> -  if (TYPE_MODE (t1) != TYPE_MODE (t2))
> +  /* Can't be compatible types if they have different mode.  We allow
> +     mismatching modes for structures or unions because they could end
> +     with flexible array member.  */
> +  if (!RECORD_OR_UNION_TYPE_P (t1)
> +      && !RECORD_OR_UNION_TYPE_P (t2)
> +      && (TYPE_MODE (t1) != TYPE_MODE (t2)))
>      return false;

I think this is OK.  There's also the point that the mode of a
RECORD_OR_UNION_TYPE_P is really only used for RTL expansion purposes.

You might argue checking !RECORD_OR_UNION_TYPE_P (t1) is enough,
duplicating the check into the relevant switch cases (and omitting
from record/union type) might be clearer.  The mode check was supposed
to be a cheap early test.  I'd go with just checking 
!RECORD_OR_UNION_TYPE_P on t1 I guess.

I'll note that for ARRAY_TYPE the mode is similarly "irrelevant".

>    /* Non-aggregate types can be handled cheaply.  */
> @@ -13959,7 +13963,7 @@ gimple_canonical_types_compatible_p (const_tree t1, 
> const_tree t2,
>      {
>      case ARRAY_TYPE:
>        /* Array types are the same if the element types are the same and
> -      the number of elements are the same.  */
> +      minimum and maximum index are the same.  */
>        if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE 
> (t2),
>                                               trust_type_canonical)
>         || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
> @@ -14053,23 +14057,35 @@ gimple_canonical_types_compatible_p (const_tree t1, 
> const_tree t2,
>            f1 || f2;
>            f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2))
>         {
> -         /* Skip non-fields and zero-sized fields.  */
> -         while (f1 && (TREE_CODE (f1) != FIELD_DECL
> -                       || (DECL_SIZE (f1)
> -                           && integer_zerop (DECL_SIZE (f1)))))
> +         /* Skip non-fields.  We used to skip zero-sized fields, but not
> +            allow them only at the end.  */

I think this now makes

   int i : 32;
   int : 0;
   int j;

and

   int i;
   int j;

incompatible?  Did you remove this because of actual issues or
because you thought it shouldn't matter?  The underlying idea
is of course that a zero-sized field can be ignored and we
want to conservatively treat structs which only differ in
added zero-size fields as compatible (with LTO, for cross-language
TBAA inter-operability).  So the bitfield case might not be
the only relevant thing.

> +         while (f1 && (TREE_CODE (f1) != FIELD_DECL))
>             f1 = TREE_CHAIN (f1);
> -         while (f2 && (TREE_CODE (f2) != FIELD_DECL
> -                       || (DECL_SIZE (f2)
> -                           && integer_zerop (DECL_SIZE (f2)))))
> +         while (f2 && (TREE_CODE (f2) != FIELD_DECL))
>             f2 = TREE_CHAIN (f2);
>           if (!f1 || !f2)
>             break;

So previously struct { int n; } and struct { int n; int data[]; } were
not compatible by this check.  I guess also a trailing int : 0 made
structs incompatible.

I think we want to continue to ignore intermediate zero-size fields
but change the loops to

  while (f1 && (TREE_CODE (f1) != FIELD_DECL
                || (DECL_SIZE (f1)
                    && integer_zerop (DECL_SIZE (f1))
                    && TREE_CHAIN (f1))))

so we preserve the last field even when zero-size?  It's a bit odd
we treat incomplete (!DECL_SIZE) fields specially, but those should
only appear as last field, maybe || (TREE_CHAIN (f1) && integer_zerop 
(DECL_SIZE (f1)) should work to avoid ICEing for those.  But
eventually we get into this function for not laid out types where
the assumption might not hold ...


> +
> +         tree t1 = TREE_TYPE (f1);
> +         tree t2 = TREE_TYPE (f2);
> +
> +         /* Special case for array members at the end.  */

.. because?

> +         if (TREE_CHAIN (f1) == NULL_TREE
> +             && TREE_CHAIN (f2) == NULL_TREE
> +             && TREE_CODE (t1) == ARRAY_TYPE
> +             && TREE_CODE (t2) == ARRAY_TYPE
> +             && TYPE_REVERSE_STORAGE_ORDER (t1) == 
> TYPE_REVERSE_STORAGE_ORDER (t2)
> +             && TYPE_NONALIASED_COMPONENT (t1) == TYPE_NONALIASED_COMPONENT 
> (t2)

recursing on ARRAY_TYPE would have compared TYPE_STRING_FLAG

> +             && gimple_compare_field_offset (f1, f2)
> +             && gimple_canonical_types_compatible_p
> +                     (TREE_TYPE (t1), TREE_TYPE (t2),
> +                      trust_type_canonical))
> +           ;
>           /* The fields must have the same name, offset and type.  */
> -         if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
> +         else if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)

any reason you elide DECL_NONADDRESSABLE_P comparing above?  I think
this is semantically relevant (only used from Ada).

Given you basically only elide comparing of TYPE_DOMAIN I wonder if
we shouldn't add another parameter to gimple_canonical_types_compatible_p
indicating whether to disregard that?  Or split out a helper function
to compare the array-type-but-not-domain bits to make those the same
between recursing or open-coding.

I'd suggest to do

         /* The fields must have the same offset.  */
         if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
             || !gimple_compare_field_offset (f1, f2))
           return false;

upfront to isolate the sole difference on trailing ARRAY_TYPE
handling better.

I think the new mode handling makes sense in general, the LTO bits
should work fine.  The zero-size field ignoring stuff may have
too many unrelated changes.  So definitely better now.

Thanks,
Richard.

>               || !gimple_compare_field_offset (f1, f2)
>               || !gimple_canonical_types_compatible_p
> -                   (TREE_TYPE (f1), TREE_TYPE (f2),
> -                    trust_type_canonical))
> +                   (t1, t2, trust_type_canonical))
>             return false;
>         }
>  
> @@ -14213,6 +14229,10 @@ verify_type (const_tree t)
>      }
>  
>    if (COMPLETE_TYPE_P (t) && TYPE_CANONICAL (t)
> +      /* We allow a mismatch for structure or union because of
> +      flexible array members.  */
> +      && !RECORD_OR_UNION_TYPE_P (t)
> +      && !RECORD_OR_UNION_TYPE_P (TYPE_CANONICAL (t))
>        && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t)))
>      {
>        error ("%<TYPE_MODE%> of %<TYPE_CANONICAL%> is not compatible");
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to