https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102066
Bug ID: 102066 Summary: aarch64: Suboptimal addressing modes for SVE LD1W, ST1W Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64 For the code: #include <arm_sve.h> void foo(int n, float *x, float *y) { for (unsigned i=0; i<n; i+=svcntw()) { svfloat32_t val = svld1_f32(svptrue_b8(), &x[i]); svst1_f32(svptrue_b8(), &y[i], val); } } at -O3 -march=armv8.2-a+sve GCC generates: foo: cbz w0, .L1 mov w4, 0 cntw x6 ptrue p0.b, all .L3: ubfiz x3, x4, 2, 32 add w4, w4, w6 add x5, x1, x3 add x3, x2, x3 ld1w z0.s, p0/z, [x5] st1w z0.s, p0, [x3] cmp w4, w0 bcc .L3 .L1: ret but it could be making better use of the addressing modes: foo: // @foo cbz w0, .LBB0_3 mov w8, wzr ptrue p0.b cntw x9 .LBB0_2: // %for.body mov w10, w8 ld1w { z0.s }, p0/z, [x1, x10, lsl #2] add w8, w8, w9 cmp w8, w0 st1w { z0.s }, p0, [x2, x10, lsl #2] b.lo .LBB0_2 .LBB0_3: // %for.cond.cleanup ret I guess the predicates and constraints in @aarch64_pred_mov<mode> in aarch64-sve.md should allow for the scaled address modes