Getting similar speed to Libego is very impressive. It’s important to start with a fast basic engine, since it will only get much slower as you add more knowledge. Currently I only get about 19k playouts per second on a 2 core Core2Duo 2.3 GHz, on 9 line empty board. I started at about 55k playouts per second on one core (for a UCT search).
The original one only beat gnugo 17% of the time, and the current one beats gnugo 93%, both with 5000 playouts, so in this case at least much slower is much stronger (testing on 9x9, against gnugo 3.7.10, gnugo --mode gtp --chinese-rules --capture-all-dead --level 10 --positional-superko). David From: computer-go-boun...@computer-go.org [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk Sent: Monday, August 24, 2009 9:23 AM To: computer-go Subject: Re: [computer-go] Bitmap Go revisited and mockup implementation David, Confession: I have not tested 19x19. As you have noted, and others before you over the years, a 19x19 board does not fit in one but three 128-bit registers and there would be a rather big penalty as a result, perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his disappointment about the speed advantage even on 9x9 in the e-mail to the list that served as my basis. My hope, as Hideko Kato points out, is in longer registers or perhaps being able to port this to a GPU. A 64-bit OS with twice the number registers would also surely help out. Nevertheless, I was surprised that I was able to get to almost the same level of speed as Libego provides. The goals for this mockup were very humble: (1) to provide a basic implementation to see what it can do (and if I could do it), and (2) learn how to work with assembly and/or intrinsic functions (I had never done that before). It may not be too hard to try 19x19 and just see what happens. I may attempt this as a follow-up. René van de Veerdonk 2009/8/23 David Fotland <fotl...@smart-games.com> How much would you lose for 19x19 board? A board representation is not very interesting unless it scales to 19 line boards. David From: computer-go-boun...@computer-go.org [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk Sent: Sunday, August 23, 2009 1:11 PM To: computer-go@computer-go.org Subject: [computer-go] Bitmap Go revisited and mockup implementation Hi all, After years of reading this very interesting list, I decided to make my first contribution this week after reading once again about bitmap go. There is no freely available implementation of a bitmap based engine that I know of, so here is a start. Note that I am not a professional programmer, self-taught, and this is a part-time hobby, so do not expect perfect form. The attachment is an attempt to create an efficient implementation of a bitmap based light playout engine, following mostly the recommendations as spelled out by Antoine de Maricourt in his November 2006 message to this list. The application is single threaded and achieves around 75-80 kpps on my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++ using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly suffers from portability issues as a result. The 128-bit SSE instructions are called through intrinsic functions, hopefully at least these interfaces are the same for other compilers. The single goal of this mockup was to do some benchmarking, so there is no gtp-interface, but that would be rather easy to add for those skilled in the art. The standard rules are used with one exception, the Brown eye-rule is used. Multi-stone suicide is explicitly forbidden. No superko checks are performed during playouts, and there is no mercy-rule either. I would be particularly interested to know if a 64-bit OS will improve the performance significantly, as it does have a lot less register pressure. Moving to a 64-bit OS will also make some instructions available that allow for further simplifications/efficiency improvements in the code. As will enabling SSE3 or SSE4.1 instructions. One note, the random number generator used is from Agner Fog's public library and was not included in the attachment. You will have to provide a random number generator yourself or download the appropriate version from his well-known website (www.agner.org/optimize) I hope this is the start of a discussion about further improvements that can be made, but you may use this code as you wish with no restrictions or guarantees. See the readme.txt file included in the attachment for more details (also partially included below). Comments and feedback welcome, René van de Veerdonk Excerpts from the readme.txt file: Introduction: ============= Mockup of a high performance implementation of the game of Go using bitmaps as the principle datastructures. This implementation is limited to measuring the performance of the principle components by playing random playouts from an empty 9x9 board, using Chinese scoring, not allowing pass moves until no other moves are available. Suicide is not allowed, neither for single or multi-stone groups. No checks are implemented for superko, but simple ko's are disallowed explicitly. Some implementation details: ============================ Benchmarking is done in test_board.cpp, where the number of playouts (n) and repeats (j) for each test is set. Benchmarking is done starting from an empty 9x9 board until two (three with ko) passes are played in sequence. The benchmark playout follow the same scheme as method board_t::play_random_game, but with timing instrumentation added. Clock counts (cc) are measured using the cc_time_of macro. Total run-time measurement relies on the windows library. For each move during a playout, the program goes through a sequence of (1) identify all legal moves (~50cc) this follows Brown's standard definition; this definition differs from most Monte-Carlo engines and has a different set of pros and cons; bitmaps allow all moves to be processed in parallel (2) pick a uniformly random move from the available choices (~75cc) relies on a modulus operation; is not 100% uniform to save speed (3) playing it on the board (~170cc) lengthiest method; but rather straightforward Playouts end after 256 moves or whenever two passes are played in succession (three in case of a ko). The routines for each of these basic steps are contained in the file board.cpp. The underlying datastructures are 128-bit bitmaps, for which the sse2 instructions used to manipulate them are contained in the file bitmap.cpp. Further improvements can be expected from using any 64-bit OS that supports several more assembly instructions, and from using sse4.1 instructions. Example output (on a Intel 2.40 GHz Centrino Duo labtop): ========================================================= [game] = 30469.7, [moves] = 111.071 [game] = 30245.4, [moves] = 111.068 [game] = 30221.7, [moves] = 111.089 [game] = 30264.8, [moves] = 111.122 [game] = 30205.8, [moves] = 111.101 [game] = 77.1084 kpps, [moves] = 111.023 [game] = 78.0488 kpps, [moves] = 111.045 [game] = 77.1084 kpps, [moves] = 111.046 [game] = 78.0488 kpps, [moves] = 111.131 [game] = 78.0488 kpps, [moves] = 111.082 [legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187 [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201 [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276 [legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092 [legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221 finished, press any key to exit Explanation of output: ====================== There are three sets of repeated (5x) measurements, performed and timed independently to avoid cross-contamination of inserted serialization instructions and timing overhead (which still does not quite prevent reordering of instructions across timing boundaries). [game] cc's per full playout [moves] average number of moves per playout [game] thousands of playouts per second (kpps) [moves] average number of moves per playout "calls per playout" / "cc's per call" [legal] identifies all legal moves on the board [pick] picks a random move [play] processes a move [score] scores a final position [win] winrate for black Acknowledgements: ================= Antoine de Maricourt, for his very helpful description of his bitmap go implementation in detail on the computer-go mailing list: http://computer-go.org/pipermail/computer-go/2006-November/007136.html Gunnar Farneback, for providing the well documented Brown engine, whoms legal move definition is used in this mockup Lukasz Lew, for the high performance Libego implementation, which has inspired various parts of this mockup as well Agner Fog, for the Mersenne Twister and clock-cycle timing library Computer-go mailing list, where many ideas are posted and discussed _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/