Don't think it makes any difference, but:

Richard Biener <rguent...@suse.de> writes:
> @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>                    access excess elements.
>                    ???  Enhancements include peeling multiple iterations
>                    or using masked loads with a static mask.  */
> -               || (group_size * cvf) % cnunits + group_size - gap < cnunits))
> +               || ((group_size * cvf) % cnunits + group_size - gap < cnunits
> +                   /* But peeling a single scalar iteration is enough if
> +                      we can use the next power-of-two sized partial
> +                      access.  */
> +                   && ((cremain = (group_size * cvf - gap) % cnunits), true

...this might be less surprising as:

                      && ((cremain = (group_size * cvf - gap) % cnunits, true)

in terms of how the &&s line up.

Thanks,
Richard

> +                       && ((cpart_size = (1 << ceil_log2 (cremain)))
> +                           != cnunits)
> +                       && vector_vector_composition_type
> +                            (vectype, cnunits / cpart_size,
> +                             &half_vtype) == NULL_TREE))))
>           {
>             if (dump_enabled_p ())
>               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -11599,6 +11608,27 @@ vectorizable_load (vec_info *vinfo,
>                             gcc_assert (new_vtype
>                                         || LOOP_VINFO_PEELING_FOR_GAPS
>                                              (loop_vinfo));
> +                         /* But still reduce the access size to the next
> +                            required power-of-two so peeling a single
> +                            scalar iteration is sufficient.  */
> +                         unsigned HOST_WIDE_INT cremain;
> +                         if (remain.is_constant (&cremain))
> +                           {
> +                             unsigned HOST_WIDE_INT cpart_size
> +                               = 1 << ceil_log2 (cremain);
> +                             if (known_gt (nunits, cpart_size)
> +                                 && constant_multiple_p (nunits, cpart_size,
> +                                                         &num))
> +                               {
> +                                 tree ptype;
> +                                 new_vtype
> +                                   = vector_vector_composition_type (vectype,
> +                                                                     num,
> +                                                                     &ptype);
> +                                 if (new_vtype)
> +                                   ltype = ptype;
> +                               }
> +                           }
>                         }
>                     }
>                   tree offset

Reply via email to