https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #15)
> Btw, first of all unaligned stores are not supported according to the targets
> vectorization hook, thus you'd need to peel the loop to make the store
> aligned
> which for some reason doesn't happen.

Quite obvious - the loop iterates 8 times but the vectorization factor is 8
as well, so if we peel off a iteration to align the destination the vectorized
loop will never enter.

Why is the loop bound to i != 16 / sizeof *s?

>  But when peeled you certainly will see
> byte/short/word stores at least.

Like when I increase the iteration count I get for copy_short_0_1:

.L.copy_Type_0_1:
        addis 6,2,.LANCHOR0@toc@ha
        addis 7,2,.LANCHOR1@toc@ha
        addi 6,6,.LANCHOR0@toc@l
        addi 7,7,.LANCHOR1@toc@l
        li 8,7
        addi 9,6,2
        mr 10,7
        mtctr 8
        .p2align 4,,15
.L2:
        addi 10,10,2
        lhz 8,-2(10)
        addi 9,9,2
        sth 8,-2(9)
        bdnz .L2
        addi 8,7,14
        addi 7,7,29
        neg 5,8
        lvx 1,0,8
        lvx 0,0,7
        li 7,16
        lvsr 13,0,5
        addi 8,10,14
        addi 9,9,14
        addi 10,10,16
        vperm 0,1,0,13
        stvx 0,6,7
        .p2align 4,,15
.L3:
        lhzu 7,2(8)
        cmpld 7,10,8
        sthu 7,2(9)
        bne+ 7,.L3
        blr

the cost model should probably reject this, but it does not:

t.c:36:1: note: Cost model analysis:
  Vector inside of loop cost: 3
  Vector prologue cost: 17
  Vector epilogue cost: 2
  Scalar iteration cost: 2
  Scalar outside cost: 0
  Vector outside cost: 19
  prologue iterations: 7
  epilogue iterations: 1
  Calculated minimum iters for profitability: 10

Reply via email to