5 regression] Missed vectorization

jakub at gcc dot gnu.org Mon, 09 Feb 2015 06:03:59 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64909


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |kyukhin at gcc dot gnu.org
             Blocks|                            |53947

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'm with H.J. here, can't reproduce any kind of code you are showing, the loop
is normally vectorized.

But, what can we see there is that for e.g. -O3 -mavx we choose vectorization
factor of 8, which is based on the fact that there are 16-bit and 32-bit types
used in the loop, and before AVX2 we can mostly use V4SImode and V8HImode.

Compared to that, clang vectorizes it probably with vectorization factor 4
instead of 8, and as the loop has constant 12 iterations, doing it that way is
beneficial.

So, perhaps the question is why slp after cunroll hasn't vectorized the
unrolled scalar tail loop with vectorization factor 4.
pr64909.c:8:11: note: not vectorized: not enough data-refs in basic block.
Although it is true that for HImode we indeed can't fill the V8HImode, it is
only used immediately in an extension, which normally looks like:
  vect__4.7_30 = MEM[(short unsigned int *)vectp_a.6_27];
  vect__5.8_31 = [vec_unpack_lo_expr] vect__4.7_30;
  vect__5.8_32 = [vec_unpack_hi_expr] vect__4.7_30;
so all we'd need is the ability to emit a V4HImode load followed solely by
vec_unpack_lo_expr from it instead of both vec_unpack_{lo,hi}_expr.

[Bug middle-end/64909] [4.8/5 regression] Missed vectorization

Reply via email to