Hi, One of the problems with ivopts is that the auto-increment modelling just takes into account whether HAVE_PRE_INC and friends are defined for the architecture. However on ARM the VFP addressing modes don't really support PRE_INCREMENT and POST_DECREMENT forms and hence there is a bias in ivopts to prefer pre-increment forms over all-else. The attached patch attempts to fix this - in general it makes things better on ARM where a large number of cases where we have rather embarassing code generation around array accesses of floating point values where to honor this choice of auto-increment forms the compiler is forced to move things back and forth between floating point and integer registers and all other such cases.
The canonical example for this is void foo (float *x , float *y, float *z, float *m, int l) { int i; for (i = 0; i < l ; i++) { z[i] = x[i] * y[i] + m[i]; } } sub r0, r0, #4 sub r1, r1, #4 sub r3, r3, #4 add ip, r2, ip, asl #2 .L3: add r3, r3, #4 add r0, r0, #4 flds s15, [r3, #0] flds s13, [r0, #0] add r1, r1, #4 flds s14, [r1, #0] fmacs s15, s13, s14 mov r4, r3 fstmias r2!, {s15} cmp r2, ip bne .L3 .L1: ldmfd sp!, {r4} bx lr and after we generate : foo: @ args = 4, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr ip, [sp, #0] cmp ip, #0 bxle lr add ip, r0, ip, asl #2 .L3: fldmias r0!, {s13} fldmias r1!, {s14} fldmias r3!, {s15} fmacs s15, s13, s14 cmp r0, ip fstmias r2!, {s15} bne .L3 bx lr In general , ivopts could do with some TLC in this area - looking at the code generated for most of SPEC2k, I see a general improvement in performance on an A9 board with a large number of cases of transfers back and forth between VFP and integer registers much reduced (in one case mgrid I saw up to a 6% improvement in performance in mgrid , 3% in facerec) and overall upto a 1% improvement when this patch was applied to the Linaro 4.6 tree - looking at object files with the same patch applied on FSF trunk I see similar transformations as the 4.6 tree. I see some funny behaviour with twolf where there is noise in the results and I'm not confident of that particular result - In the interest of full disclosure here while looking at mgrid I noticed a few cases where we were moving values more from integer to the VFP side but overall I think this patch benefits more than harms . These appeared to be around the areas where a floating point array was being zero initialized. Given the VFP instruction set doesn't really have a zero initializer form we were moving the value 0 into integer registers, moving the value into a VFP register rather than just choosing the integer side register store - I am not yet sure why that is happening and that's somethiing I'm investigating. Before that , I wanted some feedback on this patch as it stands today as I believe it's reached a stage where it appears to be performing reasonably well. I did experiment with costs and in general trying to turn off these auto-increment forms for the FP modes when we are not in soft-float mode but nothing appeared to behave as well as this attached patch. Thoughts and comments would be welcome. I don't know of any other architectures where this will be applicable. Regards, Ramana gcc/ * tree-ssa-loop-ivopts.c (add_autoinc_candidates, get_address_cost): Replace use of HAVE_{POST/PRE}_{INCREMENT/DECREMENT} with USE_{LOAD/STORE}_{PRE/POST}_{INCREMENT/DECREMENT} appropriately. * config/arm/arm.h (ARM_AUTOINC_VALID_FOR_MODE_P): New. (USE_LOAD_POST_INCREMENT): Define. (USE_LOAD_PRE_INCREMENT): Define. (USE_LOAD_POST_DECREMENT): Define. (USE_LOAD_PRE_DECREMENT): Define. (USE_STORE_PRE_DECREMENT): Define. (USE_STORE_PRE_INCREMENT): Define. (USE_STORE_POST_DECREMENT): Define. (USE_STORE_POST_INCREMENT): Define. (ARM_POST_INC): Define. (ARM_PRE_INC): Define. (ARM_PRE_DEC): Define. (ARM_POST_DEC): Define. * config/arm/arm-protos.h (arm_autoinc_modes_ok_p): Declare. * config/arm/arm.c (arm_autoinc_modes_ok_p): Define.