On Fri, May 03, 2019 at 12:47:39PM +0200, Richard Biener wrote:
> On Wed, Dec 12, 2018 at 11:54 AM Richard Biener <rguent...@suse.de> wrote:
> >
> >
> > The following improves x264 vectorization by avoiding peeling for gaps
> > noticing that when the upper half of a vector is unused we can
> > load the lower part only (and fill the upper half with zeros - this
> > is what x86 does automatically, GIMPLE doesn't allow us to leave
> > the upper half undefined as RTL would with using subregs).
> >
> > The implementation is a little bit awkward as for optimal GIMPLE
> > code-generation and costing we'd like to go the strided load path
> > instead.  That proves somewhat difficult though thus the following
> > is easier but doesn't fill out the re-align paths nor the masked
> > paths (at least the fully masked path would never need peeling for
> > gaps).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, tested with
> > SPEC CPU 2006 and 2017 with the expected (~4%) improvement for
> > 625.x264_s.  Didn't see any positive or negative effects elsewhere.
> >
> > Queued for GCC 10.
> 
> Applied as r270847.

This regressed
FAIL: gcc.target/i386/avx512vl-pr87214-1.c execution test
(AVX512VL hw or SDE is needed to reproduce).

> > 2018-12-12  Richard Biener  <rguent...@suse.de>
> >
> >         * tree-vect-stmts.c (get_group_load_store_type): Avoid
> >         peeling for gaps by loading only lower halves of vectors
> >         if possible.
> >         (vectorizable_load): Likewise.
> >
> >         * gcc.dg/vect/slp-reduc-sad-2.c: New testcase.

        Jakub

Reply via email to