Re: [computer-go] CUDA and GPU Performance

Michael Williams Wed, 09 Sep 2009 15:56:37 -0700

Very interesting stuff.  One glimmer of hope is that the memory situations 
should improve over time since memory grows but Go boards stay the same size.



Christian Nentwich wrote:

Mark,
let me try to add some more context to answer your questions. When I sayin my conclusion that "it's not worth it", I mean it's not worth usingthe GPU to run playout algorithms of the sort that are in use today.There may be many other algorithms that form part of Go engines wherethe GPU can provide an order-of-magnitude speedup. Still more where theGPU can run in parallel with the CPU to help.
In my experiments, a CPU core got 47,000 playouts per second and the GPU170,000. But:
  - My computer has two cores (so it gets 94,000 playouts with 2 threads)
- My computer's processor (intel core duo 6600) is 3 years old, andfar from state of the art- My graphics card (Geforce 285) on the other hand, is recentlypurchased and one of the top graphics cards
That means that my old CPU already gets more than twice the speed of theGPU. An Intel Nehalem processor would surely beat it, let alone an8-core system. Bearing in mind the severe drawbacks of the GPU - theseare not general purpose processors, there is much you can't do on them -this limits their usefulness with this algorithm. Compare this speedupto truly highly parallel algorithms: random number generation, matrixmultiplication, monte-carlo simulation of options (which are highlyparallel because there is no branching and little data); you seespeedups of 10x to 100x over the CPU with those.
The 9% occupancy may be puzzling but there is little that can be doneabout that. This, and the talk about threads and blocks would take awhile to explain, because GPUs don't work like general purpose CPUs.They are SIMD processors meaning that each processor can run manythreads in parallel on different items of data but only if *all threadsare executing the same instruction*. There is only one instructiondecoding stage per processor cycle. If any "if" statements or loopsdiverge, threads will be serialised until they join again. The 9%occupancy is a function of the amount of data needed to perform thetask, and the branch divergence (caused by the playouts beingdifferent). There is little that can be done about it other than use acompletely different algorithm.
If you google "CUDA block threads" you will find out more. In short, theGPU runs like a grid cluster. In each block, 64 threads run in parallel,conceptually. On the actual hardware, in each processor 16 threads fromone block will execute followed by 16 from another ("half-warps"). Ifany threads are blocked (memory reads costs ~400 cycles!) then threadsfrom another block are scheduled instead. So the answer is: yes, thereare 64 * 80 threads conceptually but they're not always scheduled at thesame time.
Comments on specific questions below.
If paralellism is what you're looking for, why not have one thread per
move candidate? Use that to collect AMAF statistics. 16Kb is not a lot
to work with, so the statistics may have to be shared.
One thread per move candidate is feasible with the architecture I used,since every thread has its own board. I have not implemented AMAF, so Icannot comment on the statistics bit, but the "output" of your algorithmis typically not in the 16k shared memory anyway. You'd write that toglobal memory (1GB). Would uniform random playouts be good enough forthis though?
Another question I'd have is whether putting two graphics card would
double the capacity.
Yes it would. It would pretty much precisely double it (the "grid" toschedule over just gets larger, but there is no additional overhead).
Did you try this for 9x9 or 19x19?
I used 19x19. If you do it for 9x9, you can probably run 128 threads perblock because of the smaller board representation. The speedup would becorrespondingly larger (4x or more). I chose 19x19 because of the severememory limitations of the architecture; it seemed that 9x9 would justmake my life a bit too easy for comfort...
Christian

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] CUDA and GPU Performance

Reply via email to