https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66433
Bug ID: 66433 Summary: Arm NEON postincrement optimization missed Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: y.usishchev at samsung dot com Target Milestone: --- Created attachment 35701 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35701&action=edit test with vld and vst GCC from trunk, configured with --target=armv7l-tizen-linux-gnueabi with options "-O2 -mfpu=neon" on attached testcase does not generate autoincrement for vld/vst instructions. auto-inc-dec pass ignores possibilities of optimization vld/vst instructions: for code for () { //some loop s0_32x4 = vld1q_u32(s); s1_32x4 = vld1q_u32(s+4); s+=8; ... } gcc generates vld1.32 {d6-d7}, [r1] add.w r4, r1, #16 adds r1, #32 vld1.32 {d28-d29}, [r4] instead of vld1.32 {d6-d7}, [r1]! vld1.32 {d28-d29}, [r1]! This is caused by presumably wrong cost estimation: vld1.32 instruction without increment costs 4, but with increment its cost is 16 (gcc/config/arm/arm.c:9415): case MEM: if (REG_P (XEXP (x, 0))) *cost = COSTS_N_INSNS (1); ... else *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));