Joe Ramsay <joe.ram...@arm.com> writes:
> This patch improves code generation for EOR, ORR and AND on unpacked vectors 
> with SVE. The following function:
> void f (unsigned int *x, unsigned short *y, unsigned short *z) {
>   for (int i = 0; i < 7; ++i)
>     x[i] = (unsigned short) (y[i] & z[i]);
> }
>
> previously compiled to
> ptrue   p1.d, vl3
> ld1h    z0.d, p1/z, [x1, #1, mul vl]
> ptrue   p0.b, vl32
> st1h    z0.d, p0, [sp, #1, mul vl]
> ld1h    z0.d, p1/z, [x2, #1, mul vl]
> st1h    z0.d, p0, [sp]
> ldr     x3, [x2]
> ldp     x4, x2, [sp]
> ldr     x1, [x1]
> and     x1, x3, x1
> and     x2, x2, x4
> str     x2, [sp]
> ld1h    z0.d, p0/z, [sp]
> str     x1, [sp]
> uxth    z0.s, p0/m, z0.s
> st1w    z0.d, p1, [x0, #1, mul vl]
> ld1h    z0.d, p0/z, [sp]
> uxth    z0.s, p0/m, z0.s
> st1w    z0.d, p0, [x0]
> add     sp, sp, 16
> ret
>
> and now compiles to:
> ptrue   p0.s, vl7
> ptrue   p1.b, vl32
> ld1h    z1.s, p0/z, [x1]
> ld1h    z0.s, p0/z, [x2]
> add     z0.h, z0.h, z1.h
> uxth    z0.s, p1/m, z0.s
> st1w    z0.s, p0, [x0]
> ret

LGTM thanks.  Pushed to master.

Richard

Reply via email to