Mark Boon: <[EMAIL PROTECTED]>: > >On 4-mei-08, at 14:57, Hideki Kato wrote: > >> By my obserbation (they are running on my pcs and >> both are Q6600/3GHz with different mother boards), mogo_big_4core's >> perallelism is around 300% (by top command), perhaps due to its >> heavier uct part (just my guess). > >Of course the CPU load doesn't really say how effective >parallelization is. Recently I bought an octo-core Mac and have been >running some tests. It takes time to get real conclusive data but I >have some observations that come purely from some testing and >watching. When using eight cores I get a speed-up of around six >times. That is in number of playouts per second. I think that's a >much more useful metric than looking at the CPU load.
Yes, of course. It's just as wrote. >Still, even number of playouts is not the end-all I believe. I have >the distinct impression that eight cores running for one second plays >considerably worse than one core running for six seconds, even though >the number of playouts is in the same ball-park. I haven't had the >time to do an extensive test on that yet but I'm convinced that the >picture is more complicated than just looking at total computing power. I've wrote a paper about this issue for GPW 2007 (in Japanese). Following is its English abstract. Later half addresses this problem which parallel implementations of UCT show worse performance than single thread ones. The cause is that uct part create and evaluate positions _before_ mc part (threads) finishes simulations completely. ---------------------------------- A Study on Implementing Parallel MC/UCT Algorithm HIDEKI KATO and IKUO TAKEUCHI We have developed a parallel MC/UCT computer Go program as a test bed for our research, applied recurrent neural networks. We measured the execution time of both commonly used shared-tree and client-server implementations on two different types of systems, Intel Core 2 Quad on a PC and Cell Broadband Engine on a SONY PLAYSTATION 3. The client-server implementation runs three times faster and 10% slower than shared-tree on the Playstation 3 and PC, respectively. Also, the effect of a well-known problem that parallelizing Monte Carlo simulations may make UCT algorithm behave differently was evaluated with the winning rates against GNU GO. Our experiments using four cores show that the winning rates decrease 35 ELO at most and can be improved to 20 ELO. ----------------------------------- -Hideki >Mark > >---- inline file >_______________________________________________ >computer-go mailing list >computer-go@computer-go.org >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/