Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate

Andrew Stubbs Fri, 15 Apr 2011 03:54:55 -0700

Ping.

On 25/03/11 16:19, Andrew Stubbs wrote:

On 28/01/11 15:20, Richard Earnshaw wrote:


On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:

On 28/01/11 14:12, Richard Earnshaw wrote:

So what happens to a variation of your testcase:

long long foolong (long long x, short *a, short *b)
{
return x + (long long)*a * (long long)*b;
}

With your patch? This should generate identical code to your original
test-case.


The patch has no effect on that testcase - it's broken in some other
way, I think, and the same with and without my patch:

ldrsh r3, [r3, #0]
ldrsh r2, [r2, #0]
push {r4, r5}
asrs r4, r3, #31
asrs r5, r2, #31
mul r4, r2, r4
mla r4, r3, r5, r4
umull r2, r3, r2, r3
adds r3, r4, r3
adds r0, r0, r2
adc r1, r1, r3
pop {r4, r5}
bx lr

Hmmm, that probably doesn't add anything useful to the discussion. :(

I'll add that one to the todo list ...

Andrew


Ouch! I though that used to work :-(



I looked at this one again, but on a second inspection I'm not sure
there's much wrong with it?

When I wrote the above I thought that there was a 64-bit multiply
instruction, but now I look more closely I see there isn't, hence the
above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does
a 64-bit -> 64-bit multiply, and then adds 'x'.

Can the umull/add/add/adc be optimized using umlal? It's too late on a
Friday to workout what's going on with the carries ....

Also, I don't fully understand why the compiler can't just disregard the
casts and use maddhidi4? Isn't that mathematically equivalent in this case?

When you say it used to work, what did it use to look like?

Thanks

Andrew

Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate

Reply via email to