On Fri, Feb 23, 2001 at 10:28:58PM +0000, Philip Blundell wrote: > This kind of thing (from longlong.h) is never good news. Someone probably > needs to figure out what the `arm3' version is doing and why it doesn't work > on armv4. (It ought not to be using explicit hard registers either.)
Do you have anything apart from its comment to suggest that it even works on arm3? It appears to be splitting the numbers a and b (which it is about to multiply) into top and bottom 16 bits ie 0xAAAAaaaa * 0xBBBBbbbb to ((0xAAAA << 16) + 0xaaaa) * ((0xBBBB << 16) + 0xbbbb) to multiply and then sum: (0xAAAA * 0xBBBB) << 32 + (0xAAAA * 0xbbbb) << 16 + (0xaaaa * 0xBBBB) << 16 + (0xaaaa * 0xbbbb) (the "addcs %0, %0, #65536" is where it propagates the [carry from adding the central pair of partial products] into the result high word]) except that > /* SAM This does not work on arm4 */ > #define umul_ppmm(xh, xl, a, b) \ > __asm__ ("%@ Inlined umul_ppmm > mov %|r0, %2, lsr #16 > mov %|r2, %3, lsr #16 ^^^^ > bic %|r1, %2, %|r0, lsl #16 > bic %|r2, %3, %|r2, lsl #16 ^^^^ I think the author has messed up by using the same register twice here. I think it should be %0 (untested) > mul %1, %|r1, %|r2 > mul %|r2, %|r0, %|r2 > mul %|r1, %0, %|r1 > mul %0, %|r0, %0 > adds %|r1, %|r2, %|r1 > addcs %0, %0, #65536 > adds %1, %1, %|r1, lsl #16 > adc %0, %0, %|r1, lsr #16" \ > : "=&r" ((USItype)(xh)), \ > "=r" ((USItype)(xl)) \ > : "r" ((USItype)(a)), \ > "r" ((USItype)(b)) \ > : "r0", "r1", "r2") > #elif __ARM_ARCH_4__ > #define umul_ppmm(xh, xl, a, b) \ > __asm__ ("%@ Inlined umul_ppmm > umull %r1, %r0, %r2, %r3" \ > : "=&r" ((USItype)(xh)), \ > "=r" ((USItype)(xl)) \ > : "r" ((USItype)(a)), \ > "r" ((USItype)(b)) \ > : "r0", "r1") > #else > #error Untested architecture > #endif I don't believe that the non ARM3 code should actually work, and I see no reason why (if I've found the bug) the corrected code should not work equally well on ARM4 (but more slowly than an umull ?) Nicholas Clark