On 29/11/13 11:46, Yufeng Zhang wrote: > On 11/29/13 07:52, Bin.Cheng wrote: >> After thinking twice, I some kind of think we should not re-associate >> addresses during expanding, because of lacking of context information. >> Take base + scaled_index + offset as an example in PR57540, we just >> don't know if "base+offset" is loop invariant from either backend or >> RTL expander. > > I'm getting less convinced by re-associating base with offset > unconditionally. One counter example is > > typedef int arr_1[20]; > void foo (arr_1 a1, int i) > { > a1[i+10] = 1; > } > > I'm experimenting a patch to get the immediate offset in the above > example to be the last addend in the address computing (as mentioned in > http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the > following code-gen: > > add r1, r0, r1, asl #2 > mov r3, #1 > str r3, [r1, #40] > > With your patch applied, the effort will be reverted to > > add r0, r0, #40 > mov r3, #1 > str r3, [r0, r1, asl #2] >
And another one is: typedef int arr_1[20]; void foo (arr_1 a1, int i) { a1[i+10] = 1; a1[i+11] = 1; } This should compile to: add r1, r0, r1, asl #2 mov r3, #1 str r3, [r1, #40] str r3, [r1, #44] And which on Thumb2 should then collapse to: add r1, r0, r1, asl #2 mov r3, #1 strd r3, r3, [r1, #40] With your patch I don't see any chance of being able to get to this situation. (BTW, we currently generate: mov r3, #1 add r1, r1, #10 add r2, r0, r1, asl #2 str r3, [r0, r1, asl #2] str r3, [r2, #4] which is insane). I think I see where you're coming from on the original testcase, but I think you're trying to solve the wrong problem. In your test case the base is an eliminable register, which is likely to be replaced with an offset expression during register allocation. The problem then, I think, is that the cost of these virtual registers is treated the same as any other pseudo register, when it may really have the cost of a PLUS expression. Perhaps the cost of using an eliminable register should be raised in rtx_costs() (treating them as equivalent to (PLUS (reg) (CONST_INT (TBD))), so that loop optimizations will try to hoist suitable sub-expressions out the loop and replace them with real pseudos. R.