2009/8/24 René van de Veerdonk <rene.vandeveerd...@gmail.com>:
> David,
> Confession: I have not tested 19x19. As you have noted, and others before
> you over the years, a 19x19 board does not fit in one but three 128-bit
> registers and there would be a rather big penalty as a result, perhaps
> (likely?) wiping out all of the benefits of bitmaps. Antoine voiced his
> disappointment about the speed advantage even on 9x9 in the e-mail to the
> list that served as my basis. My hope, as Hideko Kato points out, is in
> longer registers or perhaps being able to port this to a GPU. A 64-bit OS
> with twice the number registers would also surely help out. Nevertheless, I
> was surprised that I was able to get to almost the same level of speed as
> Libego provides.

Libego provides 105-108 kpps on a single Core2 2.4 GHz on 9x9.
and 22-23 kpps on 19x19

Can you test your bitmap implementation on 19x19?

Lukasz

> The goals for this mockup were very humble: (1) to provide a basic
> implementation to see what it can do (and if I could do it), and (2) learn
> how to work with assembly and/or intrinsic functions (I had never done that
> before). It may not be too hard to try 19x19 and just see what happens. I
> may attempt this as a follow-up.
> René van de Veerdonk
> 2009/8/23 David Fotland <fotl...@smart-games.com>
>>
>> How much would you lose for 19x19 board?  A board representation is not
>> very interesting unless it scales to 19 line boards.
>>
>>
>>
>> David
>>
>>
>>
>> From: computer-go-boun...@computer-go.org
>> [mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de
>> Veerdonk
>> Sent: Sunday, August 23, 2009 1:11 PM
>> To: computer-go@computer-go.org
>> Subject: [computer-go] Bitmap Go revisited and mockup implementation
>>
>>
>>
>> Hi all,
>>
>>
>>
>> After years of reading this very interesting list, I decided to make my
>> first contribution this week after reading once again about bitmap go. There
>> is no freely available implementation of a bitmap based engine that I know
>> of, so here is a start. Note that I am not a professional programmer,
>> self-taught, and this is a part-time hobby, so do not expect perfect form.
>>
>>
>>
>> The attachment is an attempt to create an efficient implementation of a
>> bitmap based light playout engine, following mostly the recommendations as
>> spelled out by Antoine de Maricourt in his November 2006 message to this
>> list. The application is single threaded and achieves around 75-80 kpps on
>> my Centrino Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++
>> using the Microsoft Visual Studio 2008 IDE for Windows XP (32-bit) and
>> undoubtedly suffers from portability issues as a result. The 128-bit SSE
>> instructions are called through intrinsic functions, hopefully at least
>> these interfaces are the same for other compilers. The single goal of this
>> mockup was to do some benchmarking, so there is no gtp-interface, but that
>> would be rather easy to add for those skilled in the art. The standard rules
>> are used with one exception, the Brown eye-rule is used. Multi-stone suicide
>> is explicitly forbidden. No superko checks are performed during playouts,
>> and there is no mercy-rule either.
>>
>>
>>
>> I would be particularly interested to know if a 64-bit OS will improve the
>> performance significantly, as it does have a lot less register pressure.
>> Moving to a 64-bit OS will also make some instructions available that allow
>> for further simplifications/efficiency improvements in the code. As will
>> enabling SSE3 or SSE4.1 instructions.
>>
>>
>>
>> One note, the random number generator used is from Agner Fog's public
>> library and was not included in the attachment. You will have to provide a
>> random number generator yourself or download the appropriate version from
>> his well-known website (www.agner.org/optimize)
>>
>>
>>
>> I hope this is the start of a discussion about further improvements that
>> can be made, but you may use this code as you wish with no restrictions or
>> guarantees. See the readme.txt file included in the attachment for more
>> details (also partially included below).
>>
>>
>>
>> Comments and feedback welcome,
>>
>>
>>
>> René van de Veerdonk
>>
>>
>>
>> Excerpts from the readme.txt file:
>>
>>
>>
>> Introduction:
>>
>> =============
>>
>>
>>
>> Mockup of a high performance implementation of the game of Go using
>> bitmaps
>>
>> as the principle datastructures. This implementation is limited to
>> measuring
>>
>> the performance of the principle components by playing random playouts
>> from
>>
>> an empty 9x9 board, using Chinese scoring, not allowing pass moves until
>>
>> no other moves are available. Suicide is not allowed, neither for single
>>
>> or multi-stone groups. No checks are implemented for superko, but simple
>>
>> ko's are disallowed explicitly.
>>
>>
>>
>> Some implementation details:
>>
>> ============================
>>
>>
>>
>> Benchmarking is done in test_board.cpp, where the number of playouts (n)
>> and
>>
>> repeats (j) for each test is set. Benchmarking is done starting from an
>> empty
>>
>> 9x9 board until two (three with ko) passes are played in sequence. The
>>
>> benchmark playout follow the same scheme as method
>> board_t::play_random_game,
>>
>> but with timing instrumentation added.
>>
>>
>>
>> Clock counts (cc) are measured using the cc_time_of macro. Total run-time
>>
>> measurement relies on the windows library.
>>
>>
>>
>> For each move during a playout, the program goes through a sequence of
>>
>> (1) identify all legal moves (~50cc)
>>
>>     this follows Brown's standard definition; this definition differs from
>>
>>     most Monte-Carlo engines and has a different set of pros and cons;
>>
>>     bitmaps allow all moves to be processed in parallel
>>
>> (2) pick a uniformly random move from the available choices (~75cc)
>>
>>     relies on a modulus operation; is not 100% uniform to save speed
>>
>> (3) playing it on the board (~170cc)
>>
>>     lengthiest method; but rather straightforward
>>
>>
>>
>> Playouts end after 256 moves or whenever two passes are played in
>> succession
>>
>> (three in case of a ko). The routines for each of these basic steps are
>>
>> contained in the file board.cpp. The underlying datastructures are 128-bit
>>
>> bitmaps, for which the sse2 instructions used to manipulate them are
>> contained
>>
>> in the file bitmap.cpp.
>>
>>
>>
>> Further improvements can be expected from using any 64-bit OS that
>> supports
>>
>> several more assembly instructions, and from using sse4.1 instructions.
>>
>>
>>
>> Example output (on a Intel 2.40 GHz Centrino Duo labtop):
>>
>> =========================================================
>>
>>
>>
>> [game] = 30469.7, [moves] = 111.071
>>
>> [game] = 30245.4, [moves] = 111.068
>>
>> [game] = 30221.7, [moves] = 111.089
>>
>> [game] = 30264.8, [moves] = 111.122
>>
>> [game] = 30205.8, [moves] = 111.101
>>
>>
>>
>> [game] = 77.1084 kpps, [moves] = 111.023
>>
>> [game] = 78.0488 kpps, [moves] = 111.045
>>
>> [game] = 77.1084 kpps, [moves] = 111.046
>>
>> [game] = 78.0488 kpps, [moves] = 111.131
>>
>> [game] = 78.0488 kpps, [moves] = 111.082
>>
>>
>>
>> [legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187
>>
>> [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201
>>
>> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276
>>
>> [legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092
>>
>> [legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221
>>
>>
>>
>> finished, press any key to exit
>>
>>
>>
>> Explanation of output:
>>
>> ======================
>>
>>
>>
>> There are three sets of repeated (5x) measurements, performed and timed
>>
>> independently to avoid cross-contamination of inserted serialization
>>
>> instructions and timing overhead (which still does not quite prevent
>>
>> reordering of instructions across timing boundaries).
>>
>>
>>
>> [game]  cc's per full playout
>>
>> [moves] average number of moves per playout
>>
>>
>>
>> [game]  thousands of playouts per second (kpps)
>>
>> [moves] average number of moves per playout
>>
>>
>>
>> "calls per playout" / "cc's per call"
>>
>> [legal] identifies all legal moves on the board
>>
>> [pick]  picks a random move
>>
>> [play]  processes a move
>>
>> [score] scores a final position
>>
>> [win]   winrate for black
>>
>>
>>
>> Acknowledgements:
>>
>> =================
>>
>>
>>
>> Antoine de Maricourt, for his very helpful description of his bitmap go
>>
>>   implementation in detail on the computer-go mailing list:
>>
>>   http://computer-go.org/pipermail/computer-go/2006-November/007136.html
>>
>> Gunnar Farneback, for providing the well documented Brown engine,
>>
>>   whoms legal move definition is used in this mockup
>>
>> Lukasz Lew, for the high performance Libego implementation, which has
>> inspired
>>
>>   various parts of this mockup as well
>>
>> Agner Fog, for the Mersenne Twister and clock-cycle timing library
>>
>> Computer-go mailing list, where many ideas are posted and discussed
>>
>>
>>
>>
>>
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to