Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-18 Thread Rasmus Villemoes
On Wed, Mar 18 2015, Denys Vlasenko wrote: > Your code does four 16-bit stores. > The version below does two 32-bit ones instead, > and it is also marginally smaller. > > char *put_dec_full8(char *buf, unsigned r) > { > unsigned q; > u32 v; > > /* 0 <= r < 10^8 */ >

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-17 Thread Denys Vlasenko
On Wed, Mar 18, 2015 at 1:50 AM, Denys Vlasenko wrote: > On Sat, Feb 21, 2015 at 12:51 AM, Rasmus Villemoes > wrote: >> The most expensive part of decimal conversion is the divisions by 10 >> (albeit done using reciprocal multiplication with appropriately chosen >> constants). I decided to see if

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-17 Thread Denys Vlasenko
On Sat, Feb 21, 2015 at 12:51 AM, Rasmus Villemoes wrote: > The most expensive part of decimal conversion is the divisions by 10 > (albeit done using reciprocal multiplication with appropriately chosen > constants). I decided to see if one could eliminate around half of > these multiplications by

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-10 Thread Rasmus Villemoes
On Tue, Mar 10 2015, Tejun Heo wrote: > Hello, > > On Tue, Mar 10, 2015 at 11:47:47AM +0100, Rasmus Villemoes wrote: >> I can't explain why num_to_str apparently becomes slightly slower (the >> patch essentially didn't touch it), but the put_dec_ helpers in any case >> make up for that. > > Unrel

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-10 Thread Tejun Heo
Hello, On Tue, Mar 10, 2015 at 11:47:47AM +0100, Rasmus Villemoes wrote: > I can't explain why num_to_str apparently becomes slightly slower (the > patch essentially didn't touch it), but the put_dec_ helpers in any case > make up for that. Unrelated code changes affecting performance in seemingl

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-10 Thread Rasmus Villemoes
On Thu, Mar 05 2015, Rasmus Villemoes wrote: > On Thu, Mar 05 2015, Tejun Heo wrote: > >> I'd like to see how this actually affects larger operations - sth >> along the line of top consumes D% less CPU cycles w/ N processes - if >> for nothing else, just to get the sense of scale, > > That makes

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-05 Thread Rasmus Villemoes
On Thu, Mar 05 2015, Tejun Heo wrote: > On Thu, Mar 05, 2015 at 08:03:33AM -0800, Joe Perches wrote: >> On Thu, 2015-03-05 at 16:22 +0100, Rasmus Villemoes wrote: >> >> > I'm assuming the underwhelming response means NAK. >> >> Dunno why you assume that, sometimes it just takes >> awhile for peo

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-05 Thread Tejun Heo
On Thu, Mar 05, 2015 at 08:03:33AM -0800, Joe Perches wrote: > On Thu, 2015-03-05 at 16:22 +0100, Rasmus Villemoes wrote: > > On Sat, Feb 21 2015, Rasmus Villemoes wrote: > > > > > [...] decimal conversion [...] it does indeed seem like there is > > > something to be gained, especially on 64 bits

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-05 Thread Joe Perches
On Thu, 2015-03-05 at 16:22 +0100, Rasmus Villemoes wrote: > On Sat, Feb 21 2015, Rasmus Villemoes wrote: > > > [...] decimal conversion [...] it does indeed seem like there is > > something to be gained, especially on 64 bits. > > > > $ ./test64 > > Distribution Function Cyc

Re: [RFC] lib/vsprintf.c: Even faster decimal conversion

2015-03-05 Thread Rasmus Villemoes
On Sat, Feb 21 2015, Rasmus Villemoes wrote: > [...] decimal conversion [...] it does indeed seem like there is > something to be gained, especially on 64 bits. > > $ ./test64 > Distribution Function Cycles/conv Conv/1 sec > uniform([10, 2^64-1]) linux_put_dec 1

[RFC] lib/vsprintf.c: Even faster decimal conversion

2015-02-20 Thread Rasmus Villemoes
The most expensive part of decimal conversion is the divisions by 10 (albeit done using reciprocal multiplication with appropriately chosen constants). I decided to see if one could eliminate around half of these multiplications by emitting two digits at a time, at the cost of a 200 byte lookup tab