On 28/01/11 15:20, Richard Earnshaw wrote:

On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:
On 28/01/11 14:12, Richard Earnshaw wrote:
So what happens to a variation of your testcase:

long long foolong (long long x, short *a, short *b)
     {
         return x + (long long)*a * (long long)*b;
     }

With your patch?  This should generate identical code to your original
test-case.


The patch has no effect on that testcase - it's broken in some other
way, I think, and the same with and without my patch:

          ldrsh   r3, [r3, #0]
          ldrsh   r2, [r2, #0]
          push    {r4, r5}
          asrs    r4, r3, #31
          asrs    r5, r2, #31
          mul     r4, r2, r4
          mla     r4, r3, r5, r4
          umull   r2, r3, r2, r3
          adds    r3, r4, r3
          adds    r0, r0, r2
          adc     r1, r1, r3
          pop     {r4, r5}
          bx      lr

Hmmm, that probably doesn't add anything useful to the discussion. :(

I'll add that one to the todo list ...

Andrew


Ouch!  I though that used to work :-(


I looked at this one again, but on a second inspection I'm not sure there's much wrong with it?

When I wrote the above I thought that there was a 64-bit multiply instruction, but now I look more closely I see there isn't, hence the above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does a 64-bit -> 64-bit multiply, and then adds 'x'.

Can the umull/add/add/adc be optimized using umlal? It's too late on a Friday to workout what's going on with the carries ....

Also, I don't fully understand why the compiler can't just disregard the casts and use maddhidi4? Isn't that mathematically equivalent in this case?

When you say it used to work, what did it use to look like?

Thanks

Andrew

Reply via email to