Re: [PATCH] ifcombine field-merge: saturate align at inner object size

Richard Biener Wed, 11 Dec 2024 06:07:55 -0800

> Am 11.12.2024 um 14:50 schrieb Alexandre Oliva <ol...@adacore.com>:
> 
> On Dec 10, 2024, Richard Biener <richard.guent...@gmail.com> wrote:
> 
>>> On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva <ol...@adacore.com> wrote:
>>> Regstrapped on x86_64-linux-gnu.  The aarch64 CI had reported a
>>> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu,
>>> but it will still be a while.  Ok to install if it passes?
> 
>> OK.
> 
> Thanks.  The aarch64-linux-gnu bootstrap revealed a latent surprise, in
> the form of a bootstrap failure, that this incremental patch addresses.
> I'll hold off from installing the larger patch without a fix for this.
> 
> 
> A bootstrap on aarch64-linux-gnu revealed that sometimes (for example,
> when building shorten_branches in final.cc) we will find such things
> as MEM <unsigned int>, where unsigned int happen to be a variant of
> the original unsigned int type, given 64-bit alignment.  This unusual
> alignment circumstance caused get_best_mode to choose DImode instead
> of SImode, and that failed gimple verification because there aren't
> that many bits in the unsigned int object.
> 
> Arrange for alignment to saturate at the inner object size to avoid
> tripping this error.
> 
> Regstrapping on x86_64-linux-gnu.  Regstrapped an earlier version
> without lr_align on aarch64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>    * gimple-fold.cc (fold_truth_andor_for_ifcombine): Saturate
>    align at inner object size.
> ---
> gcc/gimple-fold.cc |   39 +++++++++++++++++++++++++++------------
> 1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a31fc283d51b0..967356a950192 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8204,24 +8204,31 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>      to be relative to a field of that size.  */
>   first_bit = MIN (ll_bitpos, rl_bitpos);
>   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> +  HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> +  /* Guard from types with wider-than-size alignment.  We must not widen the
> +     load beyond its total size.  This is rare.  */
> +  while (ll_align > BITS_PER_UNIT
> +     && TYPE_SIZE (TREE_TYPE (ll_inner))
> +     && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner)))
> +     && known_gt ((unsigned HOST_WIDE_INT)ll_align,
> +              tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)))))
> +    ll_align /= 2;
>   if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,

I think These 0, 0 args are supposed to indicate Maximum extent of the 
resulting Access - but Widening to the largest Mode fitting alignment boundary 
should be valid, so I Wonder what the failure Mode was?

Richard 

> -             TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> -             volatilep, &lnmode))
> +             ll_align, BITS_PER_WORD, volatilep, &lnmode))
>     l_split_load = false;
>   else
>     {
>       /* Consider the possibility of recombining loads if any of the
>     fields straddles across an alignment boundary, so that either
>     part can be loaded along with the other field.  */
> -      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
>       HOST_WIDE_INT boundary = compute_split_boundary_from_align
> -    (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +    (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> 
>       if (boundary < 0
>      || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> -                 align, BITS_PER_WORD, volatilep, &lnmode)
> +                 ll_align, BITS_PER_WORD, volatilep, &lnmode)
>      || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> -                 align, BITS_PER_WORD, volatilep, &lnmode2))
> +                 ll_align, BITS_PER_WORD, volatilep, &lnmode2))
>    return 0;
> 
>       /* If we can't have a single load, but can with two, figure out whether
> @@ -8368,16 +8375,24 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>     and then we use two separate compares.  */
>       first_bit = MIN (lr_bitpos, rr_bitpos);
>       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> +      HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> +      /* Guard from types with wider-than-size alignment.  We must not widen 
> the
> +     load beyond its total size.  This is rare.  */
> +      while (lr_align > BITS_PER_UNIT
> +         && TYPE_SIZE (TREE_TYPE (lr_inner))
> +         && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner)))
> +         && known_gt ((unsigned HOST_WIDE_INT)lr_align,
> +              tree_to_poly_uint64 (TYPE_SIZE
> +                           (TREE_TYPE (lr_inner)))))
> +    lr_align /= 2;
>       if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -              TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> -              volatilep, &rnmode))
> +              lr_align, BITS_PER_WORD, volatilep, &rnmode))
>    {
>      /* Consider the possibility of recombining loads if any of the
>         fields straddles across an alignment boundary, so that either
>         part can be loaded along with the other field.  */
> -      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
>      HOST_WIDE_INT boundary = compute_split_boundary_from_align
> -        (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> +        (lr_align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> 
>      if (boundary < 0
>          /* If we're to split both, make sure the split point is
> @@ -8386,9 +8401,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>          && (boundary - lr_bitpos
>              != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
>          || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> -                 align, BITS_PER_WORD, volatilep, &rnmode)
> +                 lr_align, BITS_PER_WORD, volatilep, &rnmode)
>          || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> -                 align, BITS_PER_WORD, volatilep, &rnmode2))
> +                 lr_align, BITS_PER_WORD, volatilep, &rnmode2))
>        return 0;
> 
>      r_split_load = true;
> 
> 
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>   Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] ifcombine field-merge: saturate align at inner object size

Reply via email to