Hi Jim, This looks like a general issue with reg+reg addressing modes being generated in cases where it is not correct. I haven't looked at lmbench for a while, but it generated absolutely horrible code like:
add x1, x0, #120 ldr v0, [x2, x1] add x1, x0, #128 ldr v1, [x2, x1] If this is still happening, we need to fix this in a general way as this is bad for any CPU even if reg+reg is fast. Reduced testcases for this would be welcome as it's likely many other targets are affected. A while ago I posted a patch to reassociate (x + C1) * C2 -> x * C2 + C3 to improve cases like this. Note we already support adjusting addressing costs. There are several other CPUs which increase the reg+reg costs. So that's where I would start - assuming reg+reg addressing mode was correctly used. Finally, if you really want to completely disable a particular addressing mode, it's best done in classify_address rather than changing the md patterns. My experience is that if you use anything other than the standard 'm' constraint, GCC reload starts to generate inefficient code even if the pattern should still apply. I have posted several patches to use 'm' more to get better spill code and efficient expansions if the offset happens to be too large. Wilco