https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834
--- Comment #7 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
Thanks for looking at this.

(In reply to kugan from comment #6)
>       cmp     w3, 0
>       ble     .L1
>       sub     w3, w3, #1
>       mov     x4, 0
>       cntw    x5
>       ptrue   p1.s, all
>       lsr     w3, w3, 1
>       add     w3, w3, 1
>       whilelo p0.s, xzr, x3
>       .p2align 3,,7
> .L3:
>       ld2w    {z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
>       ld2w    {z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
>       add     z0.s, z4.s, z2.s
>       sub     z1.s, z5.s, z3.s
>       st2w    {z0.s - z1.s}, p0, [x0, x4, lsl 2]
>       whilelo p0.s, x5, x3
>       incb    x4, all, mul #2
>       incw    x5
>       ptest   p1, p0.b
>       bne     .L3
> .L1:
>       ret
>       .cfi_endproc

This doesn't look right.  x4 is an index, so it should be
incremented by the number of words in two vectors, rather than
the number of bytes in two vectors.

Reply via email to