David,
Confession: I have not tested 19x19. As you have noted, and others before
you over the years, a 19x19 board does not fit in one but three 128-bit
registers and there would be a rather big penalty as a result, perhaps
(likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
disappointment about the speed advantage even on 9x9 in the e-mail to the
list that served as my basis. My hope, as Hideko Kato points out, is in
longer registers or perhaps being able to port this to a GPU. A 64-bit OS
with twice the number registers would also surely help out. Nevertheless, I
was surprised that I was able to get to almost the same level of speed as
Libego provides.

The goals for this mockup were very humble: (1) to provide a basic
implementation to see what it can do (and if I could do it), and (2) learn
how to work with assembly and/or intrinsic functions (I had never done that
before). It may not be too hard to try 19x19 and just see what happens. I
may attempt this as a follow-up.

René van de Veerdonk

2009/8/23 David Fotland <fotl...@smart-games.com>

>  How much would you lose for 19x19 board?  A board representation is not
> very interesting unless it scales to 19 line boards.
>
>
>
> David
>
>
>
> *From:* computer-go-boun...@computer-go.org [mailto:
> computer-go-boun...@computer-go.org] *On Behalf Of *René van de Veerdonk
> *Sent:* Sunday, August 23, 2009 1:11 PM
> *To:* computer-go@computer-go.org
> *Subject:* [computer-go] Bitmap Go revisited and mockup implementation
>
>
>
> Hi all,
>
>
>
> After years of reading this very interesting list, I decided to make my
> first contribution this week after reading once again about bitmap go. There
> is no freely available implementation of a bitmap based engine that I know
> of, so here is a start. Note that I am not a professional programmer,
> self-taught, and this is a part-time hobby, so do not expect perfect form.
>
>
>
> The attachment is an attempt to create an efficient implementation of a
> bitmap based light playout engine, following mostly the recommendations as
> spelled out by Antoine de Maricourt in his November 2006 message to this
> list. The application is single threaded and achieves around 75-80 kpps on
> my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
> using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
> undoubtedly suffers from portability issues as a result. The 128-bit SSE
> instructions are called through intrinsic functions, hopefully at least
> these interfaces are the same for other compilers. The single goal of this
> mockup was to do some benchmarking, so there is no gtp-interface, but that
> would be rather easy to add for those skilled in the art. The standard rules
> are used with one exception, the Brown eye-rule is used. Multi-stone suicide
> is explicitly forbidden. No superko checks are performed during playouts,
> and there is no mercy-rule either.
>
>
>
> I would be particularly interested to know if a 64-bit OS will improve the
> performance significantly, as it does have a lot less register pressure.
> Moving to a 64-bit OS will also make some instructions available that allow
> for further simplifications/efficiency improvements in the code. As will
> enabling SSE3 or SSE4.1 instructions.
>
>
>
> One note, the random number generator used is from Agner Fog's public
> library and was not included in the attachment. You will have to provide a
> random number generator yourself or download the appropriate version from
> his well-known website (www.agner.org/optimize)
>
>
>
> I hope this is the start of a discussion about further improvements that
> can be made, but you may use this code as you wish with no restrictions or
> guarantees. See the readme.txt file included in the attachment for more
> details (also partially included below).
>
>
>
> Comments and feedback welcome,
>
>
>
> René van de Veerdonk
>
>
>
> Excerpts from the readme.txt file:
>
>
>
> Introduction:
>
> =============
>
>
>
> Mockup of a high performance implementation of the game of Go using bitmaps
>
> as the principle datastructures. This implementation is limited to
> measuring
>
> the performance of the principle components by playing random playouts from
>
> an empty 9x9 board, using Chinese scoring, not allowing pass moves until
>
> no other moves are available. Suicide is not allowed, neither for single
>
> or multi-stone groups. No checks are implemented for superko, but simple
>
> ko's are disallowed explicitly.
>
>
>
> Some implementation details:
>
> ============================
>
>
>
> Benchmarking is done in test_board.cpp, where the number of playouts (n)
> and
>
> repeats (j) for each test is set. Benchmarking is done starting from an
> empty
>
> 9x9 board until two (three with ko) passes are played in sequence. The
>
> benchmark playout follow the same scheme as method
> board_t::play_random_game,
>
> but with timing instrumentation added.
>
>
>
> Clock counts (cc) are measured using the cc_time_of macro. Total run-time
>
> measurement relies on the windows library.
>
>
>
> For each move during a playout, the program goes through a sequence of
>
> (1) identify all legal moves (~50cc)
>
>     this follows Brown's standard definition; this definition differs from
>
>     most Monte-Carlo engines and has a different set of pros and cons;
>
>     bitmaps allow all moves to be processed in parallel
>
> (2) pick a uniformly random move from the available choices (~75cc)
>
>     relies on a modulus operation; is not 100% uniform to save speed
>
> (3) playing it on the board (~170cc)
>
>     lengthiest method; but rather straightforward
>
>
>
> Playouts end after 256 moves or whenever two passes are played in
> succession
>
> (three in case of a ko). The routines for each of these basic steps are
>
> contained in the file board.cpp. The underlying datastructures are 128-bit
>
> bitmaps, for which the sse2 instructions used to manipulate them are
> contained
>
> in the file bitmap.cpp.
>
>
>
> Further improvements can be expected from using any 64-bit OS that supports
>
> several more assembly instructions, and from using sse4.1 instructions.
>
>
>
> Example output (on a Intel 2.40 GHz Centrino Duo labtop):
>
> =========================================================
>
>
>
> [game] = 30469.7, [moves] = 111.071
>
> [game] = 30245.4, [moves] = 111.068
>
> [game] = 30221.7, [moves] = 111.089
>
> [game] = 30264.8, [moves] = 111.122
>
> [game] = 30205.8, [moves] = 111.101
>
>
>
> [game] = 77.1084 kpps, [moves] = 111.023
>
> [game] = 78.0488 kpps, [moves] = 111.045
>
> [game] = 77.1084 kpps, [moves] = 111.046
>
> [game] = 78.0488 kpps, [moves] = 111.131
>
> [game] = 78.0488 kpps, [moves] = 111.082
>
>
>
> [legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187
>
> [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201
>
> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276
>
> [legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092
>
> [legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221
>
>
>
> finished, press any key to exit
>
>
>
> Explanation of output:
>
> ======================
>
>
>
> There are three sets of repeated (5x) measurements, performed and timed
>
> independently to avoid cross-contamination of inserted serialization
>
> instructions and timing overhead (which still does not quite prevent
>
> reordering of instructions across timing boundaries).
>
>
>
> [game]  cc's per full playout
>
> [moves] average number of moves per playout
>
>
>
> [game]  thousands of playouts per second (kpps)
>
> [moves] average number of moves per playout
>
>
>
> "calls per playout" / "cc's per call"
>
> [legal] identifies all legal moves on the board
>
> [pick]  picks a random move
>
> [play]  processes a move
>
> [score] scores a final position
>
> [win]   winrate for black
>
>
>
> Acknowledgements:
>
> =================
>
>
>
> Antoine de Maricourt, for his very helpful description of his bitmap go
>
>   implementation in detail on the computer-go mailing list:
>
>   http://computer-go.org/pipermail/computer-go/2006-November/007136.html
>
> Gunnar Farneback, for providing the well documented Brown engine,
>
>   whoms legal move definition is used in this mockup
>
> Lukasz Lew, for the high performance Libego implementation, which has
> inspired
>
>   various parts of this mockup as well
>
> Agner Fog, for the Mersenne Twister and clock-cycle timing library
>
> Computer-go mailing list, where many ideas are posted and discussed
>
>
>
>
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to