Getting similar speed to Libego is very impressive.  It’s important to start 
with a fast basic engine, since it will only get much slower as you add more 
knowledge.  Currently I only get about 19k playouts per second on a 2 core 
Core2Duo 2.3 GHz, on 9 line empty board.  I started at about 55k playouts per 
second on one core (for a UCT search).

 

The original one only beat gnugo 17% of the time, and the current one beats 
gnugo 93%, both with 5000 playouts, so in this case at least much slower is 
much stronger (testing on 9x9, against gnugo 3.7.10,

gnugo --mode gtp --chinese-rules --capture-all-dead --level 10 
--positional-superko).

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Monday, August 24, 2009 9:23 AM
To: computer-go
Subject: Re: [computer-go] Bitmap Go revisited and mockup implementation

 

David,

 

Confession: I have not tested 19x19. As you have noted, and others before you 
over the years, a 19x19 board does not fit in one but three 128-bit registers 
and there would be a rather big penalty as a result, perhaps (likely?) wiping 
out all of the benefits of bitmaps. Antoine voiced his disappointment about the 
speed advantage even on 9x9 in the e-mail to the list that served as my basis. 
My hope, as Hideko Kato points out, is in longer registers or perhaps being 
able to port this to a GPU. A 64-bit OS with twice the number registers would 
also surely help out. Nevertheless, I was surprised that I was able to get to 
almost the same level of speed as Libego provides.

 

The goals for this mockup were very humble: (1) to provide a basic 
implementation to see what it can do (and if I could do it), and (2) learn how 
to work with assembly and/or intrinsic functions (I had never done that 
before). It may not be too hard to try 19x19 and just see what happens. I may 
attempt this as a follow-up.

 

René van de Veerdonk

 

2009/8/23 David Fotland <fotl...@smart-games.com>

How much would you lose for 19x19 board?  A board representation is not very 
interesting unless it scales to 19 line boards.

 

David

 

From: computer-go-boun...@computer-go.org 
[mailto:computer-go-boun...@computer-go.org] On Behalf Of René van de Veerdonk
Sent: Sunday, August 23, 2009 1:11 PM
To: computer-go@computer-go.org
Subject: [computer-go] Bitmap Go revisited and mockup implementation

 

Hi all,

 

After years of reading this very interesting list, I decided to make my first 
contribution this week after reading once again about bitmap go. There is no 
freely available implementation of a bitmap based engine that I know of, so 
here is a start. Note that I am not a professional programmer, self-taught, and 
this is a part-time hobby, so do not expect perfect form.

 

The attachment is an attempt to create an efficient implementation of a bitmap 
based light playout engine, following mostly the recommendations as spelled out 
by Antoine de Maricourt in his November 2006 message to this list. The 
application is single threaded and achieves around 75-80 kpps on my Centrino 
Duo 2.4 GHz Intel labtop. The mockup was developed in C/C++ using the Microsoft 
Visual Studio 2008 IDE for Windows XP (32-bit) and undoubtedly suffers from 
portability issues as a result. The 128-bit SSE instructions are called through 
intrinsic functions, hopefully at least these interfaces are the same for other 
compilers. The single goal of this mockup was to do some benchmarking, so there 
is no gtp-interface, but that would be rather easy to add for those skilled in 
the art. The standard rules are used with one exception, the Brown eye-rule is 
used. Multi-stone suicide is explicitly forbidden. No superko checks are 
performed during playouts, and there is no mercy-rule either.

 

I would be particularly interested to know if a 64-bit OS will improve the 
performance significantly, as it does have a lot less register pressure. Moving 
to a 64-bit OS will also make some instructions available that allow for 
further simplifications/efficiency improvements in the code. As will enabling 
SSE3 or SSE4.1 instructions.

 

One note, the random number generator used is from Agner Fog's public library 
and was not included in the attachment. You will have to provide a random 
number generator yourself or download the appropriate version from his 
well-known website (www.agner.org/optimize)

 

I hope this is the start of a discussion about further improvements that can be 
made, but you may use this code as you wish with no restrictions or guarantees. 
See the readme.txt file included in the attachment for more details (also 
partially included below).

 

Comments and feedback welcome,

 

René van de Veerdonk

 

Excerpts from the readme.txt file:

 

Introduction:

=============

 

Mockup of a high performance implementation of the game of Go using bitmaps

as the principle datastructures. This implementation is limited to measuring

the performance of the principle components by playing random playouts from

an empty 9x9 board, using Chinese scoring, not allowing pass moves until

no other moves are available. Suicide is not allowed, neither for single

or multi-stone groups. No checks are implemented for superko, but simple

ko's are disallowed explicitly.

 

Some implementation details:

============================

 

Benchmarking is done in test_board.cpp, where the number of playouts (n) and

repeats (j) for each test is set. Benchmarking is done starting from an empty

9x9 board until two (three with ko) passes are played in sequence. The

benchmark playout follow the same scheme as method board_t::play_random_game,

but with timing instrumentation added.

 

Clock counts (cc) are measured using the cc_time_of macro. Total run-time

measurement relies on the windows library.

 

For each move during a playout, the program goes through a sequence of

(1) identify all legal moves (~50cc)

    this follows Brown's standard definition; this definition differs from

    most Monte-Carlo engines and has a different set of pros and cons;

    bitmaps allow all moves to be processed in parallel

(2) pick a uniformly random move from the available choices (~75cc)

    relies on a modulus operation; is not 100% uniform to save speed

(3) playing it on the board (~170cc)

    lengthiest method; but rather straightforward

 

Playouts end after 256 moves or whenever two passes are played in succession

(three in case of a ko). The routines for each of these basic steps are

contained in the file board.cpp. The underlying datastructures are 128-bit

bitmaps, for which the sse2 instructions used to manipulate them are contained

in the file bitmap.cpp.

 

Further improvements can be expected from using any 64-bit OS that supports

several more assembly instructions, and from using sse4.1 instructions.

 

Example output (on a Intel 2.40 GHz Centrino Duo labtop):

=========================================================

 

[game] = 30469.7, [moves] = 111.071

[game] = 30245.4, [moves] = 111.068

[game] = 30221.7, [moves] = 111.089

[game] = 30264.8, [moves] = 111.122

[game] = 30205.8, [moves] = 111.101

 

[game] = 77.1084 kpps, [moves] = 111.023

[game] = 78.0488 kpps, [moves] = 111.045

[game] = 77.1084 kpps, [moves] = 111.046

[game] = 78.0488 kpps, [moves] = 111.131

[game] = 78.0488 kpps, [moves] = 111.082

 

[legal] 110/51, [pick] 110/74, [play] 106/169, [score] 40, [win] 0.4187

[legal] 111/51, [pick] 111/74, [play] 106/168, [score] 42, [win] 0.4201

[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 43, [win] 0.4276

[legal] 111/52, [pick] 111/75, [play] 106/170, [score] 41, [win] 0.4092

[legal] 111/51, [pick] 111/74, [play] 106/171, [score] 45, [win] 0.4221

 

finished, press any key to exit

 

Explanation of output:

======================

 

There are three sets of repeated (5x) measurements, performed and timed

independently to avoid cross-contamination of inserted serialization

instructions and timing overhead (which still does not quite prevent

reordering of instructions across timing boundaries).

 

[game]  cc's per full playout

[moves] average number of moves per playout

 

[game]  thousands of playouts per second (kpps)

[moves] average number of moves per playout

 

"calls per playout" / "cc's per call"

[legal] identifies all legal moves on the board

[pick]  picks a random move

[play]  processes a move

[score] scores a final position

[win]   winrate for black

 

Acknowledgements:

=================

 

Antoine de Maricourt, for his very helpful description of his bitmap go

  implementation in detail on the computer-go mailing list:

  http://computer-go.org/pipermail/computer-go/2006-November/007136.html

Gunnar Farneback, for providing the well documented Brown engine,

  whoms legal move definition is used in this mockup

Lukasz Lew, for the high performance Libego implementation, which has inspired

  various parts of this mockup as well

Agner Fog, for the Mersenne Twister and clock-cycle timing library

Computer-go mailing list, where many ideas are posted and discussed

 

 


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

 

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to