From: Willy Tarreau > Sent: 10 August 2020 12:47 > On Sun, Aug 09, 2020 at 06:30:17PM +0000, George Spelvin wrote: > > Even something simple like buffering 8 TSC samples, and adding them > > at 32-bit offsets across the state every 8th call, would make a huge > > difference. > > Doing testing on real hardware showed that retrieving the TSC on every > call had a non negligible cost, causing a loss of 2.5% on the accept() > rate and 4% on packet rate when using iptables -m statistics. However > I reused your idea of accumulating old TSCs to increase the uncertainty > about their exact value, except that I retrieve it only on 1/8 calls > and use the previous noise in this case. With this I observe the same > performance as plain 5.8. Below are the connection rates accepted on > a single core : > > 5.8 5.8+patch 5.8+patch+tsc > 192900-197900 188800->192200 194500-197500 (conn/s) > > This was on a core i7-8700K. I looked at the asm code for the function > and it remains reasonably light, in the same order of complexity as the > original one, so I think we could go with that. > > My proposed change is below, in case you have any improvements to suggest. > > Regards, > Willy > > > diff --git a/lib/random32.c b/lib/random32.c > index 2b048e2ea99f..a12d63028106 100644 > --- a/lib/random32.c > +++ b/lib/random32.c > @@ -317,6 +317,8 @@ static void __init prandom_state_selftest(void) > > struct siprand_state { > unsigned long v[4]; > + unsigned long noise; > + unsigned long count; > }; > > static DEFINE_PER_CPU(struct siprand_state, net_rand_state) __latent_entropy; > @@ -334,7 +336,7 @@ static DEFINE_PER_CPU(struct siprand_state, > net_rand_state) __latent_entropy; > #define K0 (0x736f6d6570736575 ^ 0x6c7967656e657261 ) > #define K1 (0x646f72616e646f6d ^ 0x7465646279746573 ) > > -#elif BITS_PER_LONG == 23 > +#elif BITS_PER_LONG == 32 > /* > * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash. > * This is weaker, but 32-bit machines are not used for high-traffic > @@ -375,6 +377,12 @@ static u32 siprand_u32(struct siprand_state *s) > { > unsigned long v0 = s->v[0], v1 = s->v[1], v2 = s->v[2], v3 = s->v[3]; > > + if (++s->count >= 8) { > + v3 ^= s->noise; > + s->noise += random_get_entropy(); > + s->count = 0; > + } > +
Using: if (s->count-- <= 0) { ... s->count = 8; } probably generates better code. Although you may want to use a 'signed int' instead of 'unsigned long'. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)