Hi Micheal,
Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64,
great, three birds with one stone!). So much for portability. But,
hooray for test-cases!
This may be related to the __lzcnt intrinsic, giving you a completely
different result, i.e., 32 - my result. Could you try replacing this
function in bitmap.cpp? I believe I tested this alternative during
development and it generates only slightly worse performing code.
__forceinline
unsigned int bitmap_t::lowest_index () const {
const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
unsigned long m = _mm_movemask_epi8 (comp) ^ 0xffff;
if (!_BitScanForward (&m, m)) return 127;
/* offending instruction */
//return 8*m + __lzcnt (board.m128i_u32[m/4]);
/* new code follows */
unsigned long n;
_BitScanForward (&n, board.m128i_u32[m/4]);
return 8*m + n;
}
René
PS. x64 ... new playground, you could use __lzcnt64 ... so many things
to try!
On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams
<michaelwilliam...@gmail.com <mailto:michaelwilliam...@gmail.com>> wrote:
(AMD Phenom CPU)
Michael Williams wrote:
It came through fine the first time. Gmail probably hid the
duplicate text from you for convenience since it saw that it was
text that you already sent out. Or something.
I can compile the original (9x9) bitmap Go files in Windows 7
x64, but when I run it I get this:
test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]: 68 91
test-error [bitmap_t::lowest_index]: 69 90
test-error [bitmap_t::lowest_index]: 70 89
test-error [bitmap_t::lowest_index]: 71 88
test-error [bitmap_t::lowest_index]: 72 87
test-error [bitmap_t::lowest_index]: 73 86
test-error [bitmap_t::lowest_index]: 74 85
test-error [bitmap_t::lowest_index]: 75 84
test-error [bitmap_t::lowest_index]: 76 83
test-error [bitmap_t::lowest_index]: 77 82
test-error [bitmap_t::lowest_index]: 78 81
test-error [bitmap_t::lowest_index]: 79 80
test-error [bitmap_t::lowest_index]: 80 79
test-error-1 [bitmap_t::is_one_bit]: 0
test-error-1 [bitmap_t::is_one_bit]: 1
test-error-1 [bitmap_t::is_one_bit]: 2
test-error-1 [bitmap_t::is_one_bit]: 3
test-error-1 [bitmap_t::is_one_bit]: 4
test-error-1 [bitmap_t::is_one_bit]: 5
test-error-1 [bitmap_t::is_one_bit]: 6
test-error-1 [bitmap_t::is_one_bit]: 7
test-error-1 [bitmap_t::is_one_bit]: 8
test-error-1 [bitmap_t::is_one_bit]: 9
test-error-1 [bitmap_t::is_one_bit]: 10
test-error-1 [bitmap_t::is_one_bit]: 11
test-error-1 [bitmap_t::is_one_bit]: 12
test-error-1 [bitmap_t::is_one_bit]: 13
test-error-1 [bitmap_t::is_one_bit]: 14
test-error-1 [bitmap_t::is_one_bit]: 15
test-error-1 [bitmap_t::is_one_bit]: 16
test-error-1 [bitmap_t::is_one_bit]: 17
test-error-1 [bitmap_t::is_one_bit]: 18
test-error-1 [bitmap_t::is_one_bit]: 19
test-error-1 [bitmap_t::is_one_bit]: 20
test-error-1 [bitmap_t::is_one_bit]: 21
test-error-1 [bitmap_t::is_one_bit]: 22
test-error-1 [bitmap_t::is_one_bit]: 23
test-error-1 [bitmap_t::is_one_bit]: 24
test-error-1 [bitmap_t::is_one_bit]: 25
test-error-1 [bitmap_t::is_one_bit]: 26
test-error-1 [bitmap_t::is_one_bit]: 27
test-error-1 [bitmap_t::is_one_bit]: 28
test-error-1 [bitmap_t::is_one_bit]: 29
test-error-1 [bitmap_t::is_one_bit]: 30
test-error-1 [bitmap_t::is_one_bit]: 31
test-error-1 [bitmap_t::is_one_bit]: 32
test-error-1 [bitmap_t::is_one_bit]: 33
test-error-1 [bitmap_t::is_one_bit]: 34
test-error-1 [bitmap_t::is_one_bit]: 35
test-error-1 [bitmap_t::is_one_bit]: 36
test-error-1 [bitmap_t::is_one_bit]: 37
test-error-1 [bitmap_t::is_one_bit]: 38
test-error-1 [bitmap_t::is_one_bit]: 39
test-error-1 [bitmap_t::is_one_bit]: 40
test-error-1 [bitmap_t::is_one_bit]: 41
test-error-1 [bitmap_t::is_one_bit]: 42
test-error-1 [bitmap_t::is_one_bit]: 43
test-error-1 [bitmap_t::is_one_bit]: 44
test-error-1 [bitmap_t::is_one_bit]: 45
test-error-1 [bitmap_t::is_one_bit]: 46
test-error-1 [bitmap_t::is_one_bit]: 47
test-error-1 [bitmap_t::is_one_bit]: 48
test-error-1 [bitmap_t::is_one_bit]: 49
test-error-1 [bitmap_t::is_one_bit]: 50
test-error-1 [bitmap_t::is_one_bit]: 51
test-error-1 [bitmap_t::is_one_bit]: 52
test-error-1 [bitmap_t::is_one_bit]: 53
test-error-1 [bitmap_t::is_one_bit]: 54
test-error-1 [bitmap_t::is_one_bit]: 55
test-error-1 [bitmap_t::is_one_bit]: 56
test-error-1 [bitmap_t::is_one_bit]: 57
test-error-1 [bitmap_t::is_one_bit]: 58
test-error-1 [bitmap_t::is_one_bit]: 59
test-error-1 [bitmap_t::is_one_bit]: 60
test-error-1 [bitmap_t::is_one_bit]: 61
test-error-1 [bitmap_t::is_one_bit]: 62
test-error-1 [bitmap_t::is_one_bit]: 63
test-error-1 [bitmap_t::is_one_bit]: 64
test-error-1 [bitmap_t::is_one_bit]: 65
test-error-1 [bitmap_t::is_one_bit]: 66
test-error-1 [bitmap_t::is_one_bit]: 67
test-error-1 [bitmap_t::is_one_bit]: 68
test-error-1 [bitmap_t::is_one_bit]: 69
test-error-1 [bitmap_t::is_one_bit]: 70
test-error-1 [bitmap_t::is_one_bit]: 71
test-error-1 [bitmap_t::is_one_bit]: 72
test-error-1 [bitmap_t::is_one_bit]: 73
test-error-1 [bitmap_t::is_one_bit]: 74
test-error-1 [bitmap_t::is_one_bit]: 75
test-error-1 [bitmap_t::is_one_bit]: 76
test-error-1 [bitmap_t::is_one_bit]: 77
test-error-1 [bitmap_t::is_one_bit]: 78
test-error-1 [bitmap_t::is_one_bit]: 79
test-error-1 [bitmap_t::is_one_bit]: 80
And then it crashes.
René van de Veerdonk wrote:
[I do not know what happens, but this is the second time the
computer-go archives miss the body. Maybe its the
attachment. I apologize for sending this a second time, this
time without attachment. If someone knows what the problem
is ... I am using Gmail]
Antoine,
I apologize for misrepresenting your feelings. What I wanted
to convey is that your e-mail expressed that with the same
amount of computation, other implementation strategies
provide more information, so the information gained for the
effort put in is disappointing. In other words, bitmap-go
had better be much faster to make up for the lack of
information gained ... and it wasn't. That's pretty much
what you are saying as well, I believe. As are the others in
this thread. I think we can all agree on this.
Going into this project, I was well aware that I was going
to end up with light playouts only, and that heavy playouts
is the way to go except perhaps for some special purposes
such as ladder readouts. I was also well aware of the
scaling argument. But, I hadn't thought that this would be
the first thing thrown at my first post, as there are many
programmers that seem to be stuck at 9x9. Nonetheless,
bitmap-go as a topic keeps resurfacing on this mailing list
every once in a while and nobody ever put solid data and a
reference implementation behind it. That is what I wanted to
accomplish with my mockup.
Confession #2: I have been working on my own MCTS
implementation using the standard implementation methods
that almost everybody else is using. But there is a
never-ending laundry-list of things that I would like to
include before I would reach reasonable strength (where
reasonable is a moving target). In the mean time, there are
many others that have demonstrated much more capable
programmers than I ever will be. So, by providing this
bitmap-go mockup at least I had the feeling that I was
contributing something newsworthy to the conversation. This
may have never happened if I would wait until my MCTS
program is ready. I imagine that there are others on this
list in a similar situation. Besides, this was an
interesting side-project and perhaps someone else can
benefit from it (go for it Brian!). And, yes, it was fun.
Okay, enough mesmerizing, on to the main topic.
David, Lukasz,
I did modify my mockup to do 19x19 bitmap-go (attached). It
is a hardcoded solution using arrays of three 128-bit
variables. I did not even attempt to optimize this version,
so this is not the best possible solution. Nonetheless, here
is a comparison:
Example output (9x9):
=====================
[game] = 30236.1, [moves] = 111.071
[game] = 30249.7, [moves] = 111.068
[game] = 30145.7, [moves] = 111.089
[game] = 30237.7, [moves] = 111.122
[game] = 30210.1, [moves] = 111.101
[game] = 78.0488 kpps, [moves] = 111.023
[game] = 78.0488 kpps, [moves] = 111.045
[game] = 78.0488 kpps, [moves] = 111.046
[game] = 79.0123 kpps, [moves] = 111.131
[game] = 78.0488 kpps, [moves] = 111.082
[legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44,
[win] 0.4187
[legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40,
[win] 0.4201
[legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42,
[win] 0.4276
[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 41,
[win] 0.4092
[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 44,
[win] 0.4221
Example output (19x19):
=======================
[game] = 316452, [moves] = 455.036
[game] = 316378, [moves] = 455.126
[game] = 316266, [moves] = 455.177
[game] = 316017, [moves] = 455.08
[game] = 316210, [moves] = 455.16
[game] = 7.45052 kpps, [moves] = 455.267
[game] = 7.45921 kpps, [moves] = 455.055
[game] = 7.45921 kpps, [moves] = 455.17
[game] = 7.45921 kpps, [moves] = 455.188
[game] = 7.47664 kpps, [moves] = 455.013
[legal] 454/144, [pick] 454/128, [play] 449/430, [score]
173, [win] 0.4592
[legal] 455/144, [pick] 455/128, [play] 449/431, [score]
173, [win] 0.4655
[legal] 454/144, [pick] 454/128, [play] 449/430, [score]
173, [win] 0.4611
[legal] 454/144, [pick] 454/128, [play] 449/431, [score]
173, [win] 0.4674
[legal] 455/144, [pick] 455/128, [play] 450/430, [score]
175, [win] 0.4661
Summary:
========
function 9x9 19x19 ratio
-----------------------------------
game [ cc ] 30216 316265 10.47
game [kpps] 78 7.46 10.45
moves 111 455 4.10
legal [ cc ] 52 144 2.77
pick [ cc ] 74 128 1.73
play [ cc ] 168 430 2.56
score [ cc ] 42 173 4.12
legal+pick+play 294 702 2.39
So, it looks that I was overly pessimistic in terms of
performance drop per move, which is a factor of 2.4x (with
little effort to optimize for 19x19). But, with 4.1x more
moves, this still resulted in a 10x speed penalty. With the
provided reference numbers, Libego only drops 4.5-5.0x,
indicating that there is almost no performance loss per move
(1.2x). Is this difference roughly in line with others
expectation?
With that, I hope this at least provides some data for
future discussions, and reference implementation for others
that may want to pick up the ball from here. Someone could
add a gtp-interface and make it available as one of Don's
reference implementations for starting go-programmers. I
would still be interested to improve this solution, but I
think I need a fresh set of eyes in order to make any more
progress. I would like to re-iterate that I expect
significant gains on a 64-bit OS as a result on the doubling
of the number of available registers. It would be nice if
someone would be able to post some results. Newer CPU's
(will) also have instructions available that can
significantly speed things up (bit-test, compare, popcount,
wider variables, etc).
René van de Veerdonk
On Tue, Aug 25, 2009 at 12:42 PM, Antoine de Maricourt
<antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>
<mailto:antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>>> wrote:
Hi René,
David,
Confession: I have not tested 19x19. As you have
noted, and
others before you over the years, a 19x19 board does
not fit in
one but three 128-bit registers and there would be a
rather big
penalty as a result, perhaps (likely?) wiping out all
of the
benefits of bitmaps. Antoine voiced his
disappointment about the
speed advantage even on 9x9 in the e-mail to the list
that
served as my basis. My hope, as Hideko Kato points
out, is in
longer registers or perhaps being able to port this
to a GPU. A
64-bit OS with twice the number registers would also
surely help
out. Nevertheless, I was surprised that I was able to
get to
almost the same level of speed as Libego provides.
As far as I remember, I was not disappointed by the speed
itself on
9x9 boards, but mainly with the following 2 points:
a) my feeling is that, as you say, it does not scale very
well on
19x19 (on current hardware).
I don't think other "classical" implementations suffer
such a big
penalty when scaling from 9x9 to 19x19.
b) I also felt that this kind of implementation was not
very flexible.
For instance, I had another classical implementation,
running at
equivalent speed, but in which local 3x3 pattern matching
was almost
for free, as well as other more elaborated information.
When I started to think about introducing 3x3 local
patterns in the
bitmap only implementation, it was clear it would not be
for free.
At that time, my conclusion was that if one only needs
pure random
play with no intelligence at all during playouts, then bitmap
implementation could compete (at least on 9x9).
If one needs more elaborated knowledge (such as small
patterns
matching, or even knowledge about blocks of stones), then
pure
bitmap implementation is probably not so competitive.
I thus gave up with the idea and jumped to more promising
ones.
Anyway, I'm glad my post has been usefull to you. And I
encourage
you to improve your implementation and let us know,
especially if
you have fun. Starting with something and playing with it
is a good
way to have new ideas, even if this makes your initial
ones look
less interesting a while after.
Best regards,
Antoine
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
<mailto:computer-go@computer-go.org>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>>
http://www.computer-go.org/mailman/listinfo/computer-go/
------------------------------------------------------------------------
_______________________________________________
computer-go mailing list
computer-go@computer-go.org <mailto:computer-go@computer-go.org>
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org <mailto:computer-go@computer-go.org>
http://www.computer-go.org/mailman/listinfo/computer-go/
------------------------------------------------------------------------
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/