Dear all, I come back to you with another weirdness due to bad code generation on my target architecture. I have a very simplified (for the moment) rtx_costs and my address_cost is inspired by the i386 version. However, I actually patched in the whole i386_rtx_cost function, constraints, predicates to see if it was something I had done wrongly but I seem to get the same results.
This is my two functions: uint64_t foo (uint64_t n, uint64_t m) { uint64_t sum = 0,i; for(i=n;i<n+m;i++) { sum += data[i] + data[i+13]; } return sum; } After the prologue of the loop, I get : mov r1, theCorrectStartAddress load r2,0(r1) load r3,104(r1) However, if I do this: uint64_t goo (uint64_t i) { return data[i] + data[i+13]; } I get : mov r1, Calculation of data[i] mov r2, Calculation of data[i+13] ldd r3,0(r1) ldd r4,0(r2) It seems that when set in a loop, the program is able to perform some type of optimization to actually get the use of the offsets where as in the case of no loop, we have twice the calculations of instructions for each address calculations. Like I said, I replaced the cost function with the x86 version and I got the same thing, so I don't really know where to look? Could the expansion of my movDI/SI and instruction definition have an implication like this? Thanks for your input, Jean Christophe Beyler