Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate

Andrew Stubbs Fri, 25 Mar 2011 09:20:05 -0700

On 28/01/11 15:20, Richard Earnshaw wrote:


On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:

On 28/01/11 14:12, Richard Earnshaw wrote:

So what happens to a variation of your testcase:

long long foolong (long long x, short *a, short *b)
     {
         return x + (long long)*a * (long long)*b;
     }

With your patch?  This should generate identical code to your original
test-case.


The patch has no effect on that testcase - it's broken in some other
way, I think, and the same with and without my patch:

          ldrsh   r3, [r3, #0]
          ldrsh   r2, [r2, #0]
          push    {r4, r5}
          asrs    r4, r3, #31
          asrs    r5, r2, #31
          mul     r4, r2, r4
          mla     r4, r3, r5, r4
          umull   r2, r3, r2, r3
          adds    r3, r4, r3
          adds    r0, r0, r2
          adc     r1, r1, r3
          pop     {r4, r5}
          bx      lr

Hmmm, that probably doesn't add anything useful to the discussion. :(

I'll add that one to the todo list ...

Andrew


Ouch!  I though that used to work :-(

I looked at this one again, but on a second inspection I'm not surethere's much wrong with it?

When I wrote the above I thought that there was a 64-bit multiplyinstruction, but now I look more closely I see there isn't, hence theabove. It does two 16-bit loads, sign-extends the inputs to 64-bit, doesa 64-bit -> 64-bit multiply, and then adds 'x'.

Can the umull/add/add/adc be optimized using umlal? It's too late on aFriday to workout what's going on with the carries ....

Also, I don't fully understand why the compiler can't just disregard thecasts and use maddhidi4? Isn't that mathematically equivalent in this case?


When you say it used to work, what did it use to look like?

Thanks

Andrew

Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate

Reply via email to