costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1

msebor at gcc dot gnu.org Mon, 02 Mar 2015 21:11:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175


--- Comment #24 from Martin Sebor <msebor at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> Why is the loop bound to i != 16 / sizeof *s?

The upper bound is intended to make the copied sequence fit into one vector
register, irrespective of the size of the array element.

The vector load and store instructions tolerate unaligned accesses and there
are permute instructions that combine the contents of two vector registers into
a single one to compensate for unaligned reads or writes.  I'm not sure it
makes sense to expect unaligned copies involving a single vector register's
worth of data to be vectorized (as done in my proposed tests for char and
short), but I would expect larger unaligned copies (i.e., multiples of 16
bytes) to benefit from it.  In my experiments I've seen no evidence of GCC
attempting to vectorize such copies but I need to do some more research to
understand why.

(In reply to comment #23)

The test uses -maltivec and that's what I've been using as well.  But I see in
the Power ISA book that lxvw4x and stxvw4x are classified as VSX instructions,
so perhaps they shouldn't be emitted without -mvsx.  Although 5.0 doesn't emit
them even with -vsx.

[Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1

Reply via email to