I did a simple test. I compiled this version and the version below to
get a sense of how fast it is.
Your version with RDTSC is quite slow compared to the version below.
I determined that RDTSC is taking almost all the time however, because
just for fun I took out that call and it was much faster.
I generated 100 million random numbers in a for loop with and extra test
to print out the last 10 numbers and here are the times:
CMWC4096 = 0.712 seconds
KESSEL = 4.504 seconds
Here is the loop I used
for (i=100000000; i>0; i--) {
cur = CMWC4096();
if (i <= 10) {
printf("%20u %20d\n", cur, (int) cur - last);
}
last = cur;
}
And the code for CMWC4096 (without the initialization code)
static unsigned long Q[4096], c=362436;
unsigned int CMWC4096(void)
{
unsigned long long t, a=18782LL, b=4294967295LL;
static int i=4095;
unsigned int x, r=b-1;
i=(i+1)&4095;
t=a*Q[i]+c;
c=(t>>32);
t=(t&b)+c;
if(t>r) {c++; t=t-b;}
return(Q[i]=r-t);
}
A van Kessel wrote:
A reasonable xor+shift random (similar to Marsaglia's, but only 64bit
instead of 128 bits), using the pentium's rdtsc-instrunction to add
additional "entropy" (i used gnu's inline assembler) :::
#define rdtscll(val) \
__asm__ __volatile__("rdtsc" : "=A" (val))
typedef unsigned long long BIGTHING;
BIGTHING rdtsc_rand(void);
/******************************************************/
BIGTHING rdtsc_rand(void)
{
static BIGTHING val=0x0000000011111111ULL;
BIGTHING new;
rdtscll(new);
val ^= (val >> 15) ^ (val << 14) ^ 9 ^ new;
return val;
}
/******************************************************/
The extra ^9 at the end can be omitted if the "new" is nonzero.
(is only meant to avoid the "val" becoming all-zeros. And shifting zeros won't
change the value ;-)
HTH,
AvK
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/