Re: [computer-go] Lockless hash table and other parallel search ideas

Don Dailey Fri, 21 Mar 2008 18:11:19 -0700

These are used in parallel chess programs, and it's very common.   A
pretty good article on this written by Hyatt (Crafty programmer and
author of former world computer chess champion Cray Blitz) and it's
called  "A lock-less transposition table implementation for parallel
search chess engines",


I see an on-line version of a similar article here:

     http://www.cis.uab.edu/hyatt/hashing.html


- Don




Rémi Coulom wrote:
> Hi,
>
> I have got a lockless hash table to work, and I thought I'd share the
> results.
>
> A lockless hash table is very important, because the usual approach
> that consists in using a global lock during tree search and update
> does not scale well, especially on 9x9. But it is possible to create a
> completely lockless hash table data structure that works with multiple
> threads.
>
> Here are some links that give some indications of how such a thing can
> be done:
> http://video.google.com/videoplay?docid=2139967204534450862
> http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html
> http://www.cs.rug.nl/~wim/mechver/hashtable/index.html
>
> Some of those links may look intimidating, but that is because the
> resize part of the algorithm is complicated. In my implementation, I
> do not resize the table, so it is very simple. Also, I update counter
> in each node with atomic increments and decrements (no need to lock).
>
> Here is some preliminary experimental data for 9x9 up to 8 cores,
> running 840,000 playouts, from a tactical middle-game position:
>
> (Cores / Playouts per second with spinlock / Playouts per second with
> lockless hash table)
> 1  22,477.9  22,447.9
> 2  37,769.8  43,076.9
> 3  55,888.2  60,825.5
> 4  61,448.4  79,470.2
> 5  64,665.1  95,346.2
> 6  62,407.1 110,092.0
> 7  66,508.3 130,638.0
> 8  59,196.6 146,341.0
>
> BTW, using a pthread mutex is a lot worse than a spinlock, in my
> experience. I used the fair spinlock from the Linux kernel. But any
> implementation should work.
>
> So, it is pretty cool. This was measured on only one run. Since it is
> not deterministic, performance may vary from one run to the other
> (especially since it does not always select the same best move). But
> it still clearly shows the superiority of the lockless hash table, and
> seems to indicate that it would still scale beyond 8 cores.
>
> I believe I could improve further by reducing the number of atomic
> operations. Also, thinking about how to reduce atomic operations led
> me to imagine a scheme that may works as a distributed hash table over
> a network of PCs.
>
> A simple scheme that would work on a single PC would consist in
> storing one counter per thread in the table. This way, it would not be
> necessary to use atomic operations to increment counters, and the
> cache coherency mechanism of the CPUs would handle sending data from
> core to core. The cost would be that in order to get the node
> counters, it would be necessary to add N values. Also, some
> information may arrive a little late to some threads (but I believe it
> is better to go run a playout rather than wait for data).
>
> This scheme is a bit equivalent to using a separate hash table for
> each thread, and could be generalized to a distributed hash table over
> a network: each PC would have its own hash table, and each node of the
> tree would contain two counters: my_counter and other_counter. Every
> now and then, for instance when my_counter reaches a threshold, this
> PC would broadcast my_counter to the whole network, so that everybody
> updates other_counter.
>
> I have not implemented this yet, but I will probably try it.
>
> Right now, I will test the lockless hash table more, and will probably
> connect to 9x9 CGOS with that machine sometime during the week-end.
>
> If you wish to implement your own lockless hash table, you should read
> Intel's documentation about its memory architecture. It can be found
> there:
> http://www.intel.com/products/processor/manuals/index.htm
> In particular, it is important to read "Architecture Memory Ordering
> White Paper", and about the lock prefix, the cmpxchg operation,
> sfence, lfence, and mfence.
>
> The primitives I used in my algorithm are a store fence, atomic
> increment, atomic decrement, atomic compare and swap. If you
> understand what they do, you should be able to make your own lockless
> hash table.
>
> Have fun,
>
> Rémi
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Lockless hash table and other parallel search ideas

Reply via email to