Dear all, I've been working on explaining to GCC the cost of loads/stores on my target and I arrived to this problem. Consider the following code:
uint64_t sum = 0; for(i=0; i<N; i += 2) { /* N is defined by a macro */ z0 = buff[i]; z1 = buff[i+1]; sum += z0 + z1; } Depending on the type (local/global or parameter of the function) of buff, I get different code generations for the loop: For global and local definitions of buff: $L2: ldd r6,8(r10) ldd r7,0(r10) addi r10,r10,16 cmpne r8,r11,r10 add r6,r6,r7 add r9,r9,r6 bt r8,$L2 For the parameter, I get this: $L7 add r6,r48,r10 ldd r8,0(r6) ldd r7,0(r11) addi r10,r10,16 cmpine r6,r10,1024 addi r11,r11,16 add r7,r7,r8 add r9,r9,r7 bt r6,$L7 I don't seem to see why the compiler handles the case of buff as a parameter to the function differently. It uses 2 registers and fails to see that it could use the same one with the offset like how it does it in the global/local cases. Any idea of why this happens to my code generation? I wonder now that I look at this if it's an address issue. If you compare the way it handles the end test, for local and global (where the compiler has the information of the array), the compare is done using the end address of the array, whereas this is no longer the case for the parameter. Instead it uses the number of iterations instead. I have just now confirmed this by defining the global array as a pointer or an array (int *tab or int tab[128];). In the case of the array, I get the solution I would expect. In the case of the pointer, I get the version that I do not like. Any ideas? Thank you very much for your help, Jean Christophe Beyler