These are used in parallel chess programs, and it's very common. A pretty good article on this written by Hyatt (Crafty programmer and author of former world computer chess champion Cray Blitz) and it's called "A lock-less transposition table implementation for parallel search chess engines",
I see an on-line version of a similar article here: http://www.cis.uab.edu/hyatt/hashing.html - Don Rémi Coulom wrote: > Hi, > > I have got a lockless hash table to work, and I thought I'd share the > results. > > A lockless hash table is very important, because the usual approach > that consists in using a global lock during tree search and update > does not scale well, especially on 9x9. But it is possible to create a > completely lockless hash table data structure that works with multiple > threads. > > Here are some links that give some indications of how such a thing can > be done: > http://video.google.com/videoplay?docid=2139967204534450862 > http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html > http://www.cs.rug.nl/~wim/mechver/hashtable/index.html > > Some of those links may look intimidating, but that is because the > resize part of the algorithm is complicated. In my implementation, I > do not resize the table, so it is very simple. Also, I update counter > in each node with atomic increments and decrements (no need to lock). > > Here is some preliminary experimental data for 9x9 up to 8 cores, > running 840,000 playouts, from a tactical middle-game position: > > (Cores / Playouts per second with spinlock / Playouts per second with > lockless hash table) > 1 22,477.9 22,447.9 > 2 37,769.8 43,076.9 > 3 55,888.2 60,825.5 > 4 61,448.4 79,470.2 > 5 64,665.1 95,346.2 > 6 62,407.1 110,092.0 > 7 66,508.3 130,638.0 > 8 59,196.6 146,341.0 > > BTW, using a pthread mutex is a lot worse than a spinlock, in my > experience. I used the fair spinlock from the Linux kernel. But any > implementation should work. > > So, it is pretty cool. This was measured on only one run. Since it is > not deterministic, performance may vary from one run to the other > (especially since it does not always select the same best move). But > it still clearly shows the superiority of the lockless hash table, and > seems to indicate that it would still scale beyond 8 cores. > > I believe I could improve further by reducing the number of atomic > operations. Also, thinking about how to reduce atomic operations led > me to imagine a scheme that may works as a distributed hash table over > a network of PCs. > > A simple scheme that would work on a single PC would consist in > storing one counter per thread in the table. This way, it would not be > necessary to use atomic operations to increment counters, and the > cache coherency mechanism of the CPUs would handle sending data from > core to core. The cost would be that in order to get the node > counters, it would be necessary to add N values. Also, some > information may arrive a little late to some threads (but I believe it > is better to go run a playout rather than wait for data). > > This scheme is a bit equivalent to using a separate hash table for > each thread, and could be generalized to a distributed hash table over > a network: each PC would have its own hash table, and each node of the > tree would contain two counters: my_counter and other_counter. Every > now and then, for instance when my_counter reaches a threshold, this > PC would broadcast my_counter to the whole network, so that everybody > updates other_counter. > > I have not implemented this yet, but I will probably try it. > > Right now, I will test the lockless hash table more, and will probably > connect to 9x9 CGOS with that machine sometime during the week-end. > > If you wish to implement your own lockless hash table, you should read > Intel's documentation about its memory architecture. It can be found > there: > http://www.intel.com/products/processor/manuals/index.htm > In particular, it is important to read "Architecture Memory Ordering > White Paper", and about the lock prefix, the cmpxchg operation, > sfence, lfence, and mfence. > > The primitives I used in my algorithm are a store fence, atomic > increment, atomic decrement, atomic compare and swap. If you > understand what they do, you should be able to make your own lockless > hash table. > > Have fun, > > Rémi > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/