Hi there. This is in follow up to my email on the 24 th of May. The short version is: how can I track down why GCC is picking between two alternatives for implementing a function? In a memcpy() where Pmode == SImode, I get a near ideal implementation. If Pmode == PSImode (due to limitations of the pointer registers) I get something much worse.
The difference happens early on. In the .128r.expand with Pmode == SImode I get: ;; MEM[base: to] = MEM[base: p]; With PSImode I get offset addressing instead: ;; MEM[base: pto + ivtmp.25] = MEM[base: pfrom + ivtmp.25]; This flows through into the actual code. I assume this is due to GCC assuming that PSImode works differently to SImode and that the cast/translation cost is enough to make offset addressing overall cheaper. The m32c compiler is the only other using PSImode but it doesn't generate offsetted addresses. The same things happen with and without a basic TARGET_ADDRESS_COSTS and TARGET_RTX_COSTS. I guess I want a way of telling the compiler that PSImode and SImode are equivalent. The longer version is: The machine I'm working on has two special registers for memory access that are backed by caches. Any change to these registers can cause an expensive cache load cycle so while they're great for memory access they're terrible for general use. The problem is that Pmode == SImode so the register allocator will now and again use these registers for general operations. I've implemented a partial integer mode PSImode suggested by Mihael Meissner and set Pmode to PSImode. This correctly separates things but the compiler now generates significantly worse code. The example is a simple memcpy(): void copy(int *pfrom, int *pto, int count) { while (count != 0) { *pto = *pfrom; pto++; pfrom++; count--; } } If I have #define Pmode SImode then I get the near-best code: copy: LOADACC, R12 ;# 133 loadaccsi_insn/1 STOREACC, R13 ;# 134 storeaccsi_insn LOADLONG, #0 ;# 139 loadaccsi_insn/2 XOR, R13 ;# 140 cmpccsi_insn/3 LOADLONG, #.L4 ;# 43 *bCCeq SKIP_IF STOREACC, PC LOADACC, R11 ;# 121 loadaccsi_insn/1 STOREACC, Y ;# 122 storeaccsi_insn LOADACC, R10 ;# 127 loadaccsi_insn/1 STOREACC, X ;# 128 storeaccsi_insn .L3: LOADACC, (X) ;# 79 loadaccsi_insn/1 STOREACC, (Y) ;# 86 storeaccsi_insn LOADLONG, #4 ;# 149 loadaccsi_insn/2 ADD, Y ;# 150 addsi3_acc ADD, X ;# 151 addsi3_acc LOADLONG, #-1 ;# 103 loadaccsi_insn/2 ADD, R12 ;# 104 addsi3_acc LOADACC, R12 ;# 109 loadaccsi_insn/1 STOREACC, R10 ;# 110 storeaccsi_insn LOADLONG, #0 ;# 115 loadaccsi_insn/2 XOR, R10 ;# 116 cmpccsi_insn/3 LOADLONG, #.L3 ;# 57 *bCCne STOREACC, PC_IF .L4: POP ;# 147 *expanded_return STOREACC, PC Note the good LOADACC, (X) ;# 79 loadaccsi_insn/1 STOREACC, (Y) ;# 86 storeaccsi_insn LOADLONG, #4 ;# 149 loadaccsi_insn/2 ADD, Y ;# 150 addsi3_acc ADD, X ;# 151 addsi3_acc in the middle. Instead if I have #define Pmode PSImode I get copy: LOADACC, R14 ;# 186 loadaccsi_insn/1 PUSH ;# 187 pushsi_acc LOADACC, R12 ;# 163 loadaccsi_insn/1 STOREACC, R13 ;# 164 storeaccsi_insn LOADLONG, #0 ;# 169 loadaccsi_insn/2 XOR, R13 ;# 170 cmpccsi_insn/3 LOADLONG, #.L4 ;# 43 *bCCeq SKIP_IF STOREACC, PC LOADLONG, #0 ;# 157 loadaccsi_insn/2 STOREACC, R13 ;# 158 storeaccsi_insn .L3: LOADACC, R13 ;# 85 loadaccsi_insn/1 STOREACC, X ;# 86 storeaccsi_insn ; No-op truncate on X = X ;# 47 truncsipsi2/1 LOADACC, R11 ;# 91 loadaccpsi_insn/1 STOREACC, Y ;# 92 storeaccpsi_insn LOADACC, X ;# 97 loadaccpsi_insn/1 ADD, Y ;# 98 addpsi3_acc LOADACC, R10 ;# 103 loadaccpsi_insn/1 STOREACC, R14 ;# 104 storeaccpsi_insn LOADACC, X ;# 109 loadaccpsi_insn/1 ADD, R14 ;# 110 addpsi3_acc LOADACC, R14 ;# 115 loadaccpsi_insn/1 STOREACC, X ;# 116 storeaccpsi_insn LOADACC, (X) ;# 121 loadaccsi_insn/1 STOREACC, (Y) ;# 128 storeaccsi_insn LOADLONG, #-1 ;# 133 loadaccsi_insn/2 ADD, R12 ;# 134 addsi3_acc LOADLONG, #4 ;# 139 loadaccsi_insn/2 ADD, R13 ;# 140 addsi3_acc LOADACC, R12 ;# 145 loadaccsi_insn/1 STOREACC, X ;# 146 storeaccsi_insn LOADLONG, #0 ;# 151 loadaccsi_insn/2 XOR, X ;# 152 cmpccsi_insn/3 LOADLONG, #.L3 ;# 59 *bCCne STOREACC, PC_IF .L4: POP ;# 178 popsi_insn STOREACC, R14 POP ;# 179 *expanded_return STOREACC, PC This is equivalent to: R13 = 0 L: X = R13 X = truncate(X) Y = R11 Y += X R14 = R10 R14 += X X = R14 (Y) = (X) R12 -= 1 R13 += 4 R14 = R12 CMP R14, 0 BCCNE Thank you for your time, -- Michael