Micheal,
Thanks for trying. Could you try one or two more things?
(1) Replace the _BitScanForward in the new code to _BitScanReverse. This
probably won't help.
(2) Add this to bitmapgo.cpp:
/* sandbox */
const bool sandbox = true;
void test_sandbox () {
unsigned long m = 220;
unsigned char test_it = _BitScanForward (&m, 0);
std::cout << "testing _BitScanForward (&m, 0): "
<< (int) test_it << " " << m << "\n";
test_it = _BitScanForward (&m, 1);
std::cout << "testing _BitScanForward (&m, 1): "
<< (int) test_it << " " << m << "\n";
test_it = _BitScanForward (&m, 256);
std::cout << "testing _BitScanForward (&m, 256): "
<< (int) test_it << " " << m << "\n";
test_it = _BitScanReverse (&m, 0);
std::cout << "testing _BitScanReverse (&m, 0): "
<< (int) test_it << " " << m << "\n";
test_it = _BitScanReverse (&m, 1);
std::cout << "testing _BitScanReverse (&m, 1): "
<< (int) test_it << " " << m << "\n";
test_it = _BitScanReverse (&m, 256);
std::cout << "testing _BitScanReverse (&m, 256): "
<< (int) test_it << " " << m << "\n";
}
and add a line something like this in the main program:
if (test && sandbox) test_sandbox (); /* new line */
if (test && verify) test = test_bitmap_t::test (); /* already there */
That does not solve anything, but it may tell me if those functions are
doing for you what they do for me.
My output is:
testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 3432104
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8
Thanks, I really appreciate your help !
René
On Tue, Aug 25, 2009 at 6:38 PM, Michael Williams
<michaelwilliam...@gmail.com <mailto:michaelwilliam...@gmail.com>> wrote:
That doesn't seem to have any effect on the results.
René van de Veerdonk wrote:
Hi Micheal,
Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32
-> x64, great, three birds with one stone!). So much for
portability. But, hooray for test-cases!
This may be related to the __lzcnt intrinsic, giving you a
completely different result, i.e., 32 - my result. Could you try
replacing this function in bitmap.cpp? I believe I tested this
alternative during development and it generates only slightly
worse performing code.
__forceinline
unsigned int bitmap_t::lowest_index () const {
const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
unsigned long m = _mm_movemask_epi8 (comp) ^ 0xffff;
if (!_BitScanForward (&m, m)) return 127;
/* offending instruction */
//return 8*m + __lzcnt (board.m128i_u32[m/4]);
/* new code follows */
unsigned long n;
_BitScanForward (&n, board.m128i_u32[m/4]);
return 8*m + n;
}
René
PS. x64 ... new playground, you could use __lzcnt64 ... so many
things to try!
On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams
<michaelwilliam...@gmail.com
<mailto:michaelwilliam...@gmail.com>
<mailto:michaelwilliam...@gmail.com
<mailto:michaelwilliam...@gmail.com>>> wrote:
(AMD Phenom CPU)
Michael Williams wrote:
It came through fine the first time. Gmail probably hid the
duplicate text from you for convenience since it saw that
it was
text that you already sent out. Or something.
I can compile the original (9x9) bitmap Go files in Windows 7
x64, but when I run it I get this:
test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]: 68 91
test-error [bitmap_t::lowest_index]: 69 90
test-error [bitmap_t::lowest_index]: 70 89
test-error [bitmap_t::lowest_index]: 71 88
test-error [bitmap_t::lowest_index]: 72 87
test-error [bitmap_t::lowest_index]: 73 86
test-error [bitmap_t::lowest_index]: 74 85
test-error [bitmap_t::lowest_index]: 75 84
test-error [bitmap_t::lowest_index]: 76 83
test-error [bitmap_t::lowest_index]: 77 82
test-error [bitmap_t::lowest_index]: 78 81
test-error [bitmap_t::lowest_index]: 79 80
test-error [bitmap_t::lowest_index]: 80 79
test-error-1 [bitmap_t::is_one_bit]: 0
test-error-1 [bitmap_t::is_one_bit]: 1
test-error-1 [bitmap_t::is_one_bit]: 2
test-error-1 [bitmap_t::is_one_bit]: 3
test-error-1 [bitmap_t::is_one_bit]: 4
test-error-1 [bitmap_t::is_one_bit]: 5
test-error-1 [bitmap_t::is_one_bit]: 6
test-error-1 [bitmap_t::is_one_bit]: 7
test-error-1 [bitmap_t::is_one_bit]: 8
test-error-1 [bitmap_t::is_one_bit]: 9
test-error-1 [bitmap_t::is_one_bit]: 10
test-error-1 [bitmap_t::is_one_bit]: 11
test-error-1 [bitmap_t::is_one_bit]: 12
test-error-1 [bitmap_t::is_one_bit]: 13
test-error-1 [bitmap_t::is_one_bit]: 14
test-error-1 [bitmap_t::is_one_bit]: 15
test-error-1 [bitmap_t::is_one_bit]: 16
test-error-1 [bitmap_t::is_one_bit]: 17
test-error-1 [bitmap_t::is_one_bit]: 18
test-error-1 [bitmap_t::is_one_bit]: 19
test-error-1 [bitmap_t::is_one_bit]: 20
test-error-1 [bitmap_t::is_one_bit]: 21
test-error-1 [bitmap_t::is_one_bit]: 22
test-error-1 [bitmap_t::is_one_bit]: 23
test-error-1 [bitmap_t::is_one_bit]: 24
test-error-1 [bitmap_t::is_one_bit]: 25
test-error-1 [bitmap_t::is_one_bit]: 26
test-error-1 [bitmap_t::is_one_bit]: 27
test-error-1 [bitmap_t::is_one_bit]: 28
test-error-1 [bitmap_t::is_one_bit]: 29
test-error-1 [bitmap_t::is_one_bit]: 30
test-error-1 [bitmap_t::is_one_bit]: 31
test-error-1 [bitmap_t::is_one_bit]: 32
test-error-1 [bitmap_t::is_one_bit]: 33
test-error-1 [bitmap_t::is_one_bit]: 34
test-error-1 [bitmap_t::is_one_bit]: 35
test-error-1 [bitmap_t::is_one_bit]: 36
test-error-1 [bitmap_t::is_one_bit]: 37
test-error-1 [bitmap_t::is_one_bit]: 38
test-error-1 [bitmap_t::is_one_bit]: 39
test-error-1 [bitmap_t::is_one_bit]: 40
test-error-1 [bitmap_t::is_one_bit]: 41
test-error-1 [bitmap_t::is_one_bit]: 42
test-error-1 [bitmap_t::is_one_bit]: 43
test-error-1 [bitmap_t::is_one_bit]: 44
test-error-1 [bitmap_t::is_one_bit]: 45
test-error-1 [bitmap_t::is_one_bit]: 46
test-error-1 [bitmap_t::is_one_bit]: 47
test-error-1 [bitmap_t::is_one_bit]: 48
test-error-1 [bitmap_t::is_one_bit]: 49
test-error-1 [bitmap_t::is_one_bit]: 50
test-error-1 [bitmap_t::is_one_bit]: 51
test-error-1 [bitmap_t::is_one_bit]: 52
test-error-1 [bitmap_t::is_one_bit]: 53
test-error-1 [bitmap_t::is_one_bit]: 54
test-error-1 [bitmap_t::is_one_bit]: 55
test-error-1 [bitmap_t::is_one_bit]: 56
test-error-1 [bitmap_t::is_one_bit]: 57
test-error-1 [bitmap_t::is_one_bit]: 58
test-error-1 [bitmap_t::is_one_bit]: 59
test-error-1 [bitmap_t::is_one_bit]: 60
test-error-1 [bitmap_t::is_one_bit]: 61
test-error-1 [bitmap_t::is_one_bit]: 62
test-error-1 [bitmap_t::is_one_bit]: 63
test-error-1 [bitmap_t::is_one_bit]: 64
test-error-1 [bitmap_t::is_one_bit]: 65
test-error-1 [bitmap_t::is_one_bit]: 66
test-error-1 [bitmap_t::is_one_bit]: 67
test-error-1 [bitmap_t::is_one_bit]: 68
test-error-1 [bitmap_t::is_one_bit]: 69
test-error-1 [bitmap_t::is_one_bit]: 70
test-error-1 [bitmap_t::is_one_bit]: 71
test-error-1 [bitmap_t::is_one_bit]: 72
test-error-1 [bitmap_t::is_one_bit]: 73
test-error-1 [bitmap_t::is_one_bit]: 74
test-error-1 [bitmap_t::is_one_bit]: 75
test-error-1 [bitmap_t::is_one_bit]: 76
test-error-1 [bitmap_t::is_one_bit]: 77
test-error-1 [bitmap_t::is_one_bit]: 78
test-error-1 [bitmap_t::is_one_bit]: 79
test-error-1 [bitmap_t::is_one_bit]: 80
And then it crashes.
René van de Veerdonk wrote:
[I do not know what happens, but this is the second
time the
computer-go archives miss the body. Maybe its the
attachment. I apologize for sending this a second
time, this
time without attachment. If someone knows what the
problem
is ... I am using Gmail]
Antoine,
I apologize for misrepresenting your feelings. What I
wanted
to convey is that your e-mail expressed that with the
same
amount of computation, other implementation strategies
provide more information, so the information gained
for the
effort put in is disappointing. In other words, bitmap-go
had better be much faster to make up for the lack of
information gained ... and it wasn't. That's pretty much
what you are saying as well, I believe. As are the
others in
this thread. I think we can all agree on this.
Going into this project, I was well aware that I was
going
to end up with light playouts only, and that heavy
playouts
is the way to go except perhaps for some special purposes
such as ladder readouts. I was also well aware of the
scaling argument. But, I hadn't thought that this
would be
the first thing thrown at my first post, as there are
many
programmers that seem to be stuck at 9x9. Nonetheless,
bitmap-go as a topic keeps resurfacing on this
mailing list
every once in a while and nobody ever put solid data
and a
reference implementation behind it. That is what I
wanted to
accomplish with my mockup.
Confession #2: I have been working on my own MCTS
implementation using the standard implementation methods
that almost everybody else is using. But there is a
never-ending laundry-list of things that I would like to
include before I would reach reasonable strength (where
reasonable is a moving target). In the mean time,
there are
many others that have demonstrated much more capable
programmers than I ever will be. So, by providing this
bitmap-go mockup at least I had the feeling that I was
contributing something newsworthy to the
conversation. This
may have never happened if I would wait until my MCTS
program is ready. I imagine that there are others on this
list in a similar situation. Besides, this was an
interesting side-project and perhaps someone else can
benefit from it (go for it Brian!). And, yes, it was fun.
Okay, enough mesmerizing, on to the main topic.
David, Lukasz,
I did modify my mockup to do 19x19 bitmap-go
(attached). It
is a hardcoded solution using arrays of three 128-bit
variables. I did not even attempt to optimize this
version,
so this is not the best possible solution.
Nonetheless, here
is a comparison:
Example output (9x9):
=====================
[game] = 30236.1, [moves] = 111.071
[game] = 30249.7, [moves] = 111.068
[game] = 30145.7, [moves] = 111.089
[game] = 30237.7, [moves] = 111.122
[game] = 30210.1, [moves] = 111.101
[game] = 78.0488 kpps, [moves] = 111.023
[game] = 78.0488 kpps, [moves] = 111.045
[game] = 78.0488 kpps, [moves] = 111.046
[game] = 79.0123 kpps, [moves] = 111.131
[game] = 78.0488 kpps, [moves] = 111.082
[legal] 110/51, [pick] 110/74, [play] 106/168,
[score] 44,
[win] 0.4187
[legal] 111/51, [pick] 111/74, [play] 106/168,
[score] 40,
[win] 0.4201
[legal] 111/52, [pick] 111/75, [play] 106/168,
[score] 42,
[win] 0.4276
[legal] 111/52, [pick] 111/74, [play] 106/169,
[score] 41,
[win] 0.4092
[legal] 111/52, [pick] 111/74, [play] 106/169,
[score] 44,
[win] 0.4221
Example output (19x19):
=======================
[game] = 316452, [moves] = 455.036
[game] = 316378, [moves] = 455.126
[game] = 316266, [moves] = 455.177
[game] = 316017, [moves] = 455.08
[game] = 316210, [moves] = 455.16
[game] = 7.45052 kpps, [moves] = 455.267
[game] = 7.45921 kpps, [moves] = 455.055
[game] = 7.45921 kpps, [moves] = 455.17
[game] = 7.45921 kpps, [moves] = 455.188
[game] = 7.47664 kpps, [moves] = 455.013
[legal] 454/144, [pick] 454/128, [play] 449/430, [score]
173, [win] 0.4592
[legal] 455/144, [pick] 455/128, [play] 449/431, [score]
173, [win] 0.4655
[legal] 454/144, [pick] 454/128, [play] 449/430, [score]
173, [win] 0.4611
[legal] 454/144, [pick] 454/128, [play] 449/431, [score]
173, [win] 0.4674
[legal] 455/144, [pick] 455/128, [play] 450/430, [score]
175, [win] 0.4661
Summary:
========
function 9x9 19x19 ratio
-----------------------------------
game [ cc ] 30216 316265 10.47
game [kpps] 78 7.46 10.45
moves 111 455 4.10
legal [ cc ] 52 144 2.77
pick [ cc ] 74 128 1.73
play [ cc ] 168 430 2.56
score [ cc ] 42 173 4.12
legal+pick+play 294 702 2.39
So, it looks that I was overly pessimistic in terms of
performance drop per move, which is a factor of 2.4x
(with
little effort to optimize for 19x19). But, with 4.1x more
moves, this still resulted in a 10x speed penalty.
With the
provided reference numbers, Libego only drops 4.5-5.0x,
indicating that there is almost no performance loss
per move
(1.2x). Is this difference roughly in line with others
expectation?
With that, I hope this at least provides some data for
future discussions, and reference implementation for
others
that may want to pick up the ball from here. Someone
could
add a gtp-interface and make it available as one of Don's
reference implementations for starting go-programmers. I
would still be interested to improve this solution, but I
think I need a fresh set of eyes in order to make any
more
progress. I would like to re-iterate that I expect
significant gains on a 64-bit OS as a result on the
doubling
of the number of available registers. It would be nice if
someone would be able to post some results. Newer CPU's
(will) also have instructions available that can
significantly speed things up (bit-test, compare,
popcount,
wider variables, etc).
René van de Veerdonk
On Tue, Aug 25, 2009 at 12:42 PM, Antoine de Maricourt
<antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>
<mailto:antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>>
<mailto:antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>
<mailto:antoine.de-marico...@m4x.org
<mailto:antoine.de-marico...@m4x.org>>>> wrote:
Hi René,
David,
Confession: I have not tested 19x19. As you have
noted, and
others before you over the years, a 19x19
board does
not fit in
one but three 128-bit registers and there
would be a
rather big
penalty as a result, perhaps (likely?) wiping
out all
of the
benefits of bitmaps. Antoine voiced his
disappointment about the
speed advantage even on 9x9 in the e-mail to
the list
that
served as my basis. My hope, as Hideko Kato points
out, is in
longer registers or perhaps being able to port
this
to a GPU. A
64-bit OS with twice the number registers
would also
surely help
out. Nevertheless, I was surprised that I was
able to
get to
almost the same level of speed as Libego provides.
As far as I remember, I was not disappointed by
the speed
itself on
9x9 boards, but mainly with the following 2 points:
a) my feeling is that, as you say, it does not
scale very
well on
19x19 (on current hardware).
I don't think other "classical" implementations suffer
such a big
penalty when scaling from 9x9 to 19x19.
b) I also felt that this kind of implementation
was not
very flexible.
For instance, I had another classical implementation,
running at
equivalent speed, but in which local 3x3 pattern
matching
was almost
for free, as well as other more elaborated
information.
When I started to think about introducing 3x3 local
patterns in the
bitmap only implementation, it was clear it would
not be
for free.
At that time, my conclusion was that if one only needs
pure random
play with no intelligence at all during playouts,
then bitmap
implementation could compete (at least on 9x9).
If one needs more elaborated knowledge (such as small
patterns
matching, or even knowledge about blocks of
stones), then
pure
bitmap implementation is probably not so competitive.
I thus gave up with the idea and jumped to more
promising
ones.
Anyway, I'm glad my post has been usefull to you.
And I
encourage
you to improve your implementation and let us know,
especially if
you have fun. Starting with something and playing
with it
is a good
way to have new ideas, even if this makes your initial
ones look
less interesting a while after.
Best regards,
Antoine
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
<mailto:computer-go@computer-go.org>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>>>
http://www.computer-go.org/mailman/listinfo/computer-go/
------------------------------------------------------------------------
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
<mailto:computer-go@computer-go.org>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>>
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
<mailto:computer-go@computer-go.org>
<mailto:computer-go@computer-go.org
<mailto:computer-go@computer-go.org>>
http://www.computer-go.org/mailman/listinfo/computer-go/
------------------------------------------------------------------------
_______________________________________________
computer-go mailing list
computer-go@computer-go.org <mailto:computer-go@computer-go.org>
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org <mailto:computer-go@computer-go.org>
http://www.computer-go.org/mailman/listinfo/computer-go/
------------------------------------------------------------------------
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/