Joe Ramsay <joe.ram...@arm.com> writes: > This patch improves code generation for EOR, ORR and AND on unpacked vectors > with SVE. The following function: > void f (unsigned int *x, unsigned short *y, unsigned short *z) { > for (int i = 0; i < 7; ++i) > x[i] = (unsigned short) (y[i] & z[i]); > } > > previously compiled to > ptrue p1.d, vl3 > ld1h z0.d, p1/z, [x1, #1, mul vl] > ptrue p0.b, vl32 > st1h z0.d, p0, [sp, #1, mul vl] > ld1h z0.d, p1/z, [x2, #1, mul vl] > st1h z0.d, p0, [sp] > ldr x3, [x2] > ldp x4, x2, [sp] > ldr x1, [x1] > and x1, x3, x1 > and x2, x2, x4 > str x2, [sp] > ld1h z0.d, p0/z, [sp] > str x1, [sp] > uxth z0.s, p0/m, z0.s > st1w z0.d, p1, [x0, #1, mul vl] > ld1h z0.d, p0/z, [sp] > uxth z0.s, p0/m, z0.s > st1w z0.d, p0, [x0] > add sp, sp, 16 > ret > > and now compiles to: > ptrue p0.s, vl7 > ptrue p1.b, vl32 > ld1h z1.s, p0/z, [x1] > ld1h z0.s, p0/z, [x2] > add z0.h, z0.h, z1.h > uxth z0.s, p1/m, z0.s > st1w z0.s, p0, [x0] > ret
LGTM thanks. Pushed to master. Richard