Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread George Spelvin
Geert Uytterhoeven wrote: > On Mon, Sep 24, 2012 at 3:56 PM, George Spelvin wrote: >> SPARCv8 UMUL puts the high half of the 64-bit result into the Y >> register, and SPARCv7 has a multiply-step instruction (MULScc) which >> does likewise. > > Early SPARCs don't even have a multiply instruction.

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread Geert Uytterhoeven
On Mon, Sep 24, 2012 at 3:56 PM, George Spelvin wrote: > Michal Nazarewicz wrote: >> Didn't some SPARCs have 32x32->32 multiply? I remember reading some >> rant from a GMP developer about how SPARC is broken that way. > > SPARCv9 only has 64x64->64; there's no 128-bit result version. > That cuts

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread Denys Vlasenko
On Mon, Sep 24, 2012 at 2:35 PM, George Spelvin wrote: >> Here is the comparison of the x86-32 assembly >> of the fragment which does "x / 1" thing, >> before and after the patch: > >> -01 c6 add%eax,%esi >> -b8 59 17 b7 d1 mov$0xd1b71759,%eax >> -f7 e6

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread George Spelvin
Michal Nazarewicz wrote: > Didn't some SPARCs have 32x32->32 multiply? I remember reading some > rant from a GMP developer about how SPARC is broken that way. SPARCv9 only has 64x64->64; there's no 128-bit result version. That cuts large-integer math speed by a factor of 4 (very crude approximat

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread Michal Nazarewicz
On Mon, Sep 24 2012, George Spelvin wrote: >> You are using a 64-bit multiply in a path that is designed for 32-bit >> processors, which makes me feel that it will be slower. > > Slower than the divide it's replacing? OK, granted, it might be faster after all. ;) Still, I'd love to see some bench

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread George Spelvin
> Here is the comparison of the x86-32 assembly > of the fragment which does "x / 1" thing, > before and after the patch: > -01 c6 add%eax,%esi > -b8 59 17 b7 d1 mov$0xd1b71759,%eax > -f7 e6 mul%esi > -89 d3 mov%edx,%eb

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread George Spelvin
> You are using a 64-bit multiply in a path that is designed for 32-bit > processors, which makes me feel that it will be slower. Slower than the divide it's replacing? The following 32-bit processors have 32x32->64-bit multiply: x86 ARM (as of ARMv4 = ARM7TDMI, the lowest version in common use)

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-24 Thread Denys Vlasenko
On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin wrote: > The same multiply-by-inverse technique can be used to > convert division by 1 to a 32x32->64-bit multiply. > > Signed-off-by: George Spelvin > --- > lib/vsprintf.c | 60 > +++- > 1

Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-09-23 Thread Michal Nazarewicz
On Fri, Aug 03 2012, George Spelvin wrote: > The same multiply-by-inverse technique can be used to > convert division by 1 to a 32x32->64-bit multiply. > > Signed-off-by: George Spelvin You are using a 64-bit multiply in a path that is designed for 32-bit processors, which makes me feel that

[PATCH 2/4] lib: vsprintf: Optimize division by 10000

2012-08-02 Thread George Spelvin
The same multiply-by-inverse technique can be used to convert division by 1 to a 32x32->64-bit multiply. Signed-off-by: George Spelvin --- lib/vsprintf.c | 60 +++- 1 file changed, 33 insertions(+), 27 deletions(-) This is something of a