By the way, I'm amazed that the code for playing random games is fast enough that getting random numbers is actually a bottleneck and it's worthy of a discussion on optimization.
One of the fastest chess programs a few years ago in terms of nodes per second was Fritz. It heavily used a common technique of accessing "piece square tables" to get an evaluation instantly - basically keeping a score incrementally updated in a table driven way. This was partly why it was so incredibly fast. But the programmer would joke about "those expensive memory references" being a bottleneck probably because memory tables are such a well known technique for speeding up expensive operations. - Don On Fri, 2006-12-08 at 00:39 +0100, Antoine de Maricourt wrote: > Last time I checked MT was under 20cc per call (I assume it's > inlined). On a P4, the shift operator is supposed to have a 4 cc > latency. The dependency chain to get the mask yields to 25 cc latency > before the mask can be used (4 per shift + 1 for OR) * 5. So it's > quite possible that this sequence dominates the call to rand(). Just > give a try... unless you use another processor with a better latency > for shift operations. But even with those processors it might be > benefic to replace 15 instructions by only one. Of course if you make > a specific implementation for small n (below 256), you can remove the > 2 last shift as well. > > Antoine > > I'm pretty sure the time of this function is dominated by the call to > > rand(), but it never occurred to do a table lookup for the mask, > > interesting idea. > > > > > > - Don > > > > > > On Thu, 2006-12-07 at 22:36 +0100, Antoine de Maricourt wrote: > > > > > If this randint routine is critical, you can save some calls to rand() > > > when you know that n is always below some value (see my previous post > > > about bitmap go). > > > For instance if n < 128 (probably true for 9x9 go), you could try: > > > > > > while (true) { > > > r = rand(); > > > > > > if ((r & v) < n) return (r & v); > > > r >>= 7; > > > if ((r & v) < n) return (r & v); > > > r >>= 7; > > > ... // 2 more times > > > } > > > > > > Also it could be better to read the mask (v) from a table (provided > > > the upper limit of n is small enough). At least on a P4, it will avoid > > > a long data dependency chain and might give a (probably small) > > > improvment. It might even be benefical on other processors. > > > > > > Antoine > > > > > > > > > > On Thu, 2006-12-07 at 16:05 +0100, Łukasz Lew wrote: > > > > > > > > > > > > > ii = pm::rand () % empty_v_cnt; // TODO improve speed "%" > > > > > > > > > > > > > > Try this, I think it could be faster, not sure, but has the advantage > > > > that it's slightly more correct. > > > > > > > > // returns an integer between 0 and n-1 inclusive > > > > // > > > > unsigned long randint(unsigned long n) > > > > { > > > > unsigned long v = n; > > > > unsigned long r; > > > > > > > > v--; > > > > v |= v >> 1; > > > > v |= v >> 2; > > > > v |= v >> 4; > > > > v |= v >> 8; > > > > v |= v >> 16; > > > > > > > > do { r = rand(); } while ( (r & v) >= n ); > > > > > > > > return( r & v ); > > > > } > > > > > > > > > > > > > > > > - Don > > > > > > > > > > > > _______________________________________________ > > > > computer-go mailing list > > > > computer-go@computer-go.org > > > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/