On Tue, 11 Jun 2024, Richard Sandiford wrote:

> Don't think it makes any difference, but:
> 
> Richard Biener <rguent...@suse.de> writes:
> > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> >                  access excess elements.
> >                  ???  Enhancements include peeling multiple iterations
> >                  or using masked loads with a static mask.  */
> > -             || (group_size * cvf) % cnunits + group_size - gap < cnunits))
> > +             || ((group_size * cvf) % cnunits + group_size - gap < cnunits
> > +                 /* But peeling a single scalar iteration is enough if
> > +                    we can use the next power-of-two sized partial
> > +                    access.  */
> > +                 && ((cremain = (group_size * cvf - gap) % cnunits), true
> 
> ...this might be less surprising as:
> 
>                     && ((cremain = (group_size * cvf - gap) % cnunits, true)
> 
> in terms of how the &&s line up.

Yeah - I'll fix before pushing.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > +                     && ((cpart_size = (1 << ceil_log2 (cremain)))
> > +                         != cnunits)
> > +                     && vector_vector_composition_type
> > +                          (vectype, cnunits / cpart_size,
> > +                           &half_vtype) == NULL_TREE))))
> >         {
> >           if (dump_enabled_p ())
> >             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > @@ -11599,6 +11608,27 @@ vectorizable_load (vec_info *vinfo,
> >                           gcc_assert (new_vtype
> >                                       || LOOP_VINFO_PEELING_FOR_GAPS
> >                                            (loop_vinfo));
> > +                       /* But still reduce the access size to the next
> > +                          required power-of-two so peeling a single
> > +                          scalar iteration is sufficient.  */
> > +                       unsigned HOST_WIDE_INT cremain;
> > +                       if (remain.is_constant (&cremain))
> > +                         {
> > +                           unsigned HOST_WIDE_INT cpart_size
> > +                             = 1 << ceil_log2 (cremain);
> > +                           if (known_gt (nunits, cpart_size)
> > +                               && constant_multiple_p (nunits, cpart_size,
> > +                                                       &num))
> > +                             {
> > +                               tree ptype;
> > +                               new_vtype
> > +                                 = vector_vector_composition_type (vectype,
> > +                                                                   num,
> > +                                                                   &ptype);
> > +                               if (new_vtype)
> > +                                 ltype = ptype;
> > +                             }
> > +                         }
> >                       }
> >                   }
> >                 tree offset
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to