On 28/01/11 15:20, Richard Earnshaw wrote:
On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:On 28/01/11 14:12, Richard Earnshaw wrote:So what happens to a variation of your testcase: long long foolong (long long x, short *a, short *b) { return x + (long long)*a * (long long)*b; } With your patch? This should generate identical code to your original test-case.The patch has no effect on that testcase - it's broken in some other way, I think, and the same with and without my patch: ldrsh r3, [r3, #0] ldrsh r2, [r2, #0] push {r4, r5} asrs r4, r3, #31 asrs r5, r2, #31 mul r4, r2, r4 mla r4, r3, r5, r4 umull r2, r3, r2, r3 adds r3, r4, r3 adds r0, r0, r2 adc r1, r1, r3 pop {r4, r5} bx lr Hmmm, that probably doesn't add anything useful to the discussion. :( I'll add that one to the todo list ... AndrewOuch! I though that used to work :-(
I looked at this one again, but on a second inspection I'm not sure there's much wrong with it?
When I wrote the above I thought that there was a 64-bit multiply instruction, but now I look more closely I see there isn't, hence the above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does a 64-bit -> 64-bit multiply, and then adds 'x'.
Can the umull/add/add/adc be optimized using umlal? It's too late on a Friday to workout what's going on with the carries ....
Also, I don't fully understand why the compiler can't just disregard the casts and use maddhidi4? Isn't that mathematically equivalent in this case?
When you say it used to work, what did it use to look like? Thanks Andrew