O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

rguenth at gcc dot gnu.org Thu, 10 Dec 2015 04:28:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #36951|0                           |1
        is obsolete|                            |

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 36982
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36982&action=edit
patch for testing

Ok, so it seems general interleaving isn't going to be more profitable unless
the permutes done by SLP are way more expensive as those done by interleaving
(can happen on x86_64...).  That's because with SLP we never need to permute
the stores but only the loads - OTOH with a different permutation for each
input vector use (eventually) while interleaving will do number of input
vectors
times log (group-size) permutes.  It's hard to compute a good estimate with
the current SLP data structures so I'm leaving interleaving as-is.

For load-lanes/store-lanes I have attached an updated patch.

It still doesn't include a check against statically determined very low
iteration counts (and factor in eventually required peeling for gaps with
load/store-lanes) because at this point it would be a heuristic as well
(alignment peeling isn't computed yet).

Thus this would be my "final" patch.  Can you test it please and report back?

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

Reply via email to