Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Antoine de Maricourt

Hi René,

David,

Confession: I have not tested 19x19. As you have noted, and others 
before you over the years, a 19x19 board does not fit in one but three 
128-bit registers and there would be a rather big penalty as a result, 
perhaps (likely?) wiping out all of the benefits of bitmaps. Antoine 
voiced his disappointment about the speed advantage even on 9x9 in the 
e-mail to the list that served as my basis. My hope, as Hideko Kato 
points out, is in longer registers or perhaps being able to port this 
to a GPU. A 64-bit OS with twice the number registers would also 
surely help out. Nevertheless, I was surprised that I was able to get 
to almost the same level of speed as Libego provides.




As far as I remember, I was not disappointed by the speed itself on 9x9 
boards, but mainly with the following 2 points:


a) my feeling is that, as you say, it does not scale very well on 19x19 
(on current hardware).
I don't think other "classical" implementations suffer such a big 
penalty when scaling from 9x9 to 19x19.


b) I also felt that this kind of implementation was not very flexible.
For instance, I had another classical implementation, running at 
equivalent speed, but in which local 3x3 pattern matching was almost for 
free, as well as other more elaborated information.
When I started to think about introducing 3x3 local patterns in the 
bitmap only implementation, it was clear it would not be for free.


At that time, my conclusion was that if one only needs pure random play 
with no intelligence at all during playouts, then bitmap implementation 
could compete (at least on 9x9).
If one needs more elaborated knowledge (such as small patterns matching, 
or even knowledge about blocks of stones), then pure bitmap 
implementation is probably not so competitive.

I thus gave up with the idea and jumped to more promising ones.

Anyway, I'm glad my post has been usefull to you. And I encourage you to 
improve your implementation and let us know, especially if you have fun. 
Starting with something and playing with it is a good way to have new 
ideas, even if this makes your initial ones look less interesting a 
while after.


Best regards,

Antoine

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
[I do not know what happens, but this is the second time the computer-go
archives miss the body. Maybe its the attachment. I apologize for sending
this a second time, this time without attachment. If someone knows what the
problem is ... I am using Gmail]

Antoine,
I apologize for misrepresenting your feelings. What I wanted to convey is
that your e-mail expressed that with the same amount of computation, other
implementation strategies provide more information, so the information
gained for the effort put in is disappointing. In other words, bitmap-go had
better be much faster to make up for the lack of information gained ... and
it wasn't. That's pretty much what you are saying as well, I believe. As are
the others in this thread. I think we can all agree on this.

Going into this project, I was well aware that I was going to end up with
light playouts only, and that heavy playouts is the way to go except perhaps
for some special purposes such as ladder readouts. I was also well aware of
the scaling argument. But, I hadn't thought that this would be the first
thing thrown at my first post, as there are many programmers that seem to be
stuck at 9x9. Nonetheless, bitmap-go as a topic keeps resurfacing on this
mailing list every once in a while and nobody ever put solid data and a
reference implementation behind it. That is what I wanted to accomplish with
my mockup.

Confession #2: I have been working on my own MCTS implementation using the
standard implementation methods that almost everybody else is using. But
there is a never-ending laundry-list of things that I would like to include
before I would reach reasonable strength (where reasonable is a moving
target). In the mean time, there are many others that have demonstrated much
more capable programmers than I ever will be. So, by providing this
bitmap-go mockup at least I had the feeling that I was contributing
something newsworthy to the conversation. This may have never happened if I
would wait until my MCTS program is ready. I imagine that there are others
on this list in a similar situation. Besides, this was an interesting
side-project and perhaps someone else can benefit from it (go for it
Brian!). And, yes, it was fun.

Okay, enough mesmerizing, on to the main topic.

David, Lukasz,

I did modify my mockup to do 19x19 bitmap-go (attached). It is a hardcoded
solution using arrays of three 128-bit variables. I did not even attempt to
optimize this version, so this is not the best possible solution.
Nonetheless, here is a comparison:

Example output (9x9):
=

[game] = 30236.1, [moves] = 111.071
[game] = 30249.7, [moves] = 111.068
[game] = 30145.7, [moves] = 111.089
[game] = 30237.7, [moves] = 111.122
[game] = 30210.1, [moves] = 111.101

[game] = 78.0488 kpps, [moves] = 111.023
[game] = 78.0488 kpps, [moves] = 111.045
[game] = 78.0488 kpps, [moves] = 111.046
[game] = 79.0123 kpps, [moves] = 111.131
[game] = 78.0488 kpps, [moves] = 111.082

[legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44, [win] 0.4187
[legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40, [win] 0.4201
[legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42, [win] 0.4276
[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 41, [win] 0.4092
[legal] 111/52, [pick] 111/74, [play] 106/169, [score] 44, [win] 0.4221

Example output (19x19):
===

[game] = 316452, [moves] = 455.036
[game] = 316378, [moves] = 455.126
[game] = 316266, [moves] = 455.177
[game] = 316017, [moves] = 455.08
[game] = 316210, [moves] = 455.16

[game] = 7.45052 kpps, [moves] = 455.267
[game] = 7.45921 kpps, [moves] = 455.055
[game] = 7.45921 kpps, [moves] = 455.17
[game] = 7.45921 kpps, [moves] = 455.188
[game] = 7.47664 kpps, [moves] = 455.013

[legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win] 0.4592
[legal] 455/144, [pick] 455/128, [play] 449/431, [score] 173, [win] 0.4655
[legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win] 0.4611
[legal] 454/144, [pick] 454/128, [play] 449/431, [score] 173, [win] 0.4674
[legal] 455/144, [pick] 455/128, [play] 450/430, [score] 175, [win] 0.4661

Summary:


function9x9   19x19   ratio
---
game  [ cc ]  30216  316265   10.47
game  [kpps] 787.46   10.45
moves   111 4554.10
legal [ cc ] 52 1442.77
pick  [ cc ] 74 1281.73
play  [ cc ]168 4302.56
score [ cc ] 42 1734.12

legal+pick+play 294 7022.39

So, it looks that I was overly pessimistic in terms of performance drop per
move, which is a factor of 2.4x (with little effort to optimize for 19x19).
But, with 4.1x more moves, this still resulted in a 10x speed penalty. With
the provided reference numbers, Libego only drops 4.5-5.0x, indicating that
there is almost no performance loss per move (1.2x). Is this difference
roughly in line with others expectation?

With that, I hope this at least provides some data for future dis

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams
It came through fine the first time.  Gmail probably hid the duplicate text from you for convenience since it saw that it was text that you already sent out. 
Or something.


I can compile the original (9x9) bitmap Go files in Windows 7 x64, but when I 
run it I get this:

test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]: 68 91
test-error [bitmap_t::lowest_index]: 69 90
test-error [bitmap_t::lowest_index]: 70 89
test-error [bitmap_t::lowest_index]: 71 88
test-error [bitmap_t::lowest_index]: 72 87
test-error [bitmap_t::lowest_index]: 73 86
test-error [bitmap_t::lowest_index]: 74 85
test-error [bitmap_t::lowest_index]: 75 84
test-error [bitmap_t::lowest_index]: 76 83
test-error [bitmap_t::lowest_index]: 77 82
test-error [bitmap_t::lowest_index]: 78 81
test-error [bitmap_t::lowest_index]: 79 80
test-error [bitmap_t::lowest_index]: 80 79
test-error-1 [bitmap_t::is_one_bit]: 0
test-error-1 [bitmap_t::is_one_bit]: 1
test-error-1 [bitmap_t::is_one_bit]: 2
test-error-1 [bitmap_t::is_one_bit]: 3
test-error-1 [bitmap_t::is_one_bit]: 4
test-error-1 [bitmap_t::is_one_bit]: 5
test-error-1 [bitmap_t::is_one_bit]: 6
test-error-1 [bitmap_t::is_one_bit]: 7
test-error-1 [bitmap_t::is_one_bit]: 8
test-error-1 [bitmap_t::is_one_bit]: 9
test-error-1 [bitmap_t::is_one_bit]: 10
test-error-1 [bitmap_t::is_one_bit]: 11
test-error-1 [bitmap_t::is_one_bit]: 12
test-error-1 [bitmap_t::is_one_bit]: 13
test-error-1 [bitmap_t::is_one_bit]: 14
test-error-1 [bitmap_t::is_one_bit]: 15
test-error-1 [bitmap_t::is_one_bit]: 16
test-error-1 [bitmap_t::is_one_bit]: 17
test-error-1 [bitmap_t::is_one_bit]: 18
test-error-1 [bitmap_t::is_one_bit]: 19
test-error-1 [bitmap_t::is_one_bit]: 20
test-error-1 [bitmap_t::is_one_bit]: 21
test-error-1 [bitmap_t::is_one_bit]: 22
test-error-1 [bitmap_t::is_one_bit]: 23
test-error-1 [bitmap_t::is_one_bit]: 24
test-error-1 [bitmap_t::is_one_bit]: 25
test-error-1 [bitmap_t::is_one_bit]: 26
test-error-1 [bitmap_t::is_one_bit]: 27
test-error-1 [bitmap_t::is_one_bit]: 28
test-error-1 [bitmap_t::is_one_bit]: 29
test-error-1 [bitmap_t::is_one_bit]: 30
test-error-1 [bitmap_t::is_one_b

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

(AMD Phenom CPU)

Michael Williams wrote:
It came through fine the first time.  Gmail probably hid the duplicate 
text from you for convenience since it saw that it was text that you 
already sent out. Or something.


I can compile the original (9x9) bitmap Go files in Windows 7 x64, but 
when I run it I get this:


test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]: 68 91
test-error [bitmap_t::lowest_index]: 69 90
test-error [bitmap_t::lowest_index]: 70 89
test-error [bitmap_t::lowest_index]: 71 88
test-error [bitmap_t::lowest_index]: 72 87
test-error [bitmap_t::lowest_index]: 73 86
test-error [bitmap_t::lowest_index]: 74 85
test-error [bitmap_t::lowest_index]: 75 84
test-error [bitmap_t::lowest_index]: 76 83
test-error [bitmap_t::lowest_index]: 77 82
test-error [bitmap_t::lowest_index]: 78 81
test-error [bitmap_t::lowest_index]: 79 80
test-error [bitmap_t::lowest_index]: 80 79
test-error-1 [bitmap_t::is_one_bit]: 0
test-error-1 [bitmap_t::is_one_bit]: 1
test-error-1 [bitmap_t::is_one_bit]: 2
test-error-1 [bitmap_t::is_one_bit]: 3
test-error-1 [bitmap_t::is_one_bit]: 4
test-error-1 [bitmap_t::is_one_bit]: 5
test-error-1 [bitmap_t::is_one_bit]: 6
test-error-1 [bitmap_t::is_one_bit]: 7
test-error-1 [bitmap_t::is_one_bit]: 8
test-error-1 [bitmap_t::is_one_bit]: 9
test-error-1 [bitmap_t::is_one_bit]: 10
test-error-1 [bitmap_t::is_one_bit]: 11
test-error-1 [bitmap_t::is_one_bit]: 12
test-error-1 [bitmap_t::is_one_bit]: 13
test-error-1 [bitmap_t::is_one_bit]: 14
test-error-1 [bitmap_t::is_one_bit]: 15
test-error-1 [bitmap_t::is_one_bit]: 16
test-error-1 [bitmap_t::is_one_bit]: 17
test-error-1 [bitmap_t::is_one_bit]: 18
test-error-1 [bitmap_t::is_one_bit]: 19
test-error-1 [bitmap_t::is_one_bit]: 20
test-error-1 [bitmap_t::is_one_bit]: 21
test-error-1 [bitmap_t::is_one_bit]: 22
test-error-1 [bitmap_t::is_one_bit]: 23
test-error-1 [bitmap_t::is_one_bit]: 24
test-error-1 [bitmap_t::is_one_bit]: 25
test-error-1 [bitmap_t::is_one_bit]: 26
test-error-1 [bitmap_t::is_one_bit]: 27
test-error-1 [bitmap_t::is_one_bit]: 28
test-error-1 [bitmap_t::is_one_bit]: 29
test-error-1 [bitmap_t::is_

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Mark Boon
2009/8/25 René van de Veerdonk :
> Nonetheless, bitmap-go as a topic keeps resurfacing on this
> mailing list every once in a while and nobody ever put solid data and a
> reference implementation behind it. That is what I wanted to accomplish with
> my mockup.

I think this is interesting information, so I think it's good you
tried it. It basically means unless someone has a much smarter idea
how to implement bit-sets, the performance is not enough to go through
the trouble for most programmers.

> So, it looks that I was overly pessimistic in terms of performance drop per
> move, which is a factor of 2.4x (with little effort to optimize for 19x19).
> But, with 4.1x more moves, this still resulted in a 10x speed penalty. With
> the provided reference numbers, Libego only drops 4.5-5.0x, indicating that
> there is almost no performance loss per move (1.2x). Is this difference
> roughly in line with others expectation?

Yes, I'd expect that. The main speed-determining factor in traditional
implementations is merging. That depends on the average length of the
merge. Obviously this can be much longer on 19x19 as a worst case, but
on average I wouldn't expect it to make much difference compared to
other board-sizes.

Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Zach Wegner
On an initial look, it seems that the shifting, popcounting, and
bitscanning functions can be improved significantly. That's all I
looked at closely, perhaps the other board routines could use some
reworking too.

I'll try my hand at optimizing it if I get the time, but the windowsy
code makes it a bit difficult.

Zach
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Hi Micheal,

Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64,
great, three birds with one stone!). So much for portability. But, hooray
for test-cases!

This may be related to the __lzcnt intrinsic, giving you a completely
different result, i.e., 32 - my result. Could you try replacing this
function in bitmap.cpp? I believe I tested this alternative during
development and it generates only slightly worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (&m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (&n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things to
try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams <
michaelwilliam...@gmail.com> wrote:

> (AMD Phenom CPU)
>
>
> Michael Williams wrote:
>
>> It came through fine the first time.  Gmail probably hid the duplicate
>> text from you for convenience since it saw that it was text that you already
>> sent out. Or something.
>>
>> I can compile the original (9x9) bitmap Go files in Windows 7 x64, but
>> when I run it I get this:
>>
>> test-error [bitmap_t::lowest_index]: 0 31
>> test-error [bitmap_t::lowest_index]: 1 30
>> test-error [bitmap_t::lowest_index]: 2 29
>> test-error [bitmap_t::lowest_index]: 3 28
>> test-error [bitmap_t::lowest_index]: 4 27
>> test-error [bitmap_t::lowest_index]: 5 26
>> test-error [bitmap_t::lowest_index]: 6 25
>> test-error [bitmap_t::lowest_index]: 7 24
>> test-error [bitmap_t::lowest_index]: 8 23
>> test-error [bitmap_t::lowest_index]: 9 22
>> test-error [bitmap_t::lowest_index]: 10 21
>> test-error [bitmap_t::lowest_index]: 11 20
>> test-error [bitmap_t::lowest_index]: 12 19
>> test-error [bitmap_t::lowest_index]: 13 18
>> test-error [bitmap_t::lowest_index]: 14 17
>> test-error [bitmap_t::lowest_index]: 15 16
>> test-error [bitmap_t::lowest_index]: 16 15
>> test-error [bitmap_t::lowest_index]: 17 14
>> test-error [bitmap_t::lowest_index]: 18 13
>> test-error [bitmap_t::lowest_index]: 19 12
>> test-error [bitmap_t::lowest_index]: 20 11
>> test-error [bitmap_t::lowest_index]: 21 10
>> test-error [bitmap_t::lowest_index]: 22 9
>> test-error [bitmap_t::lowest_index]: 23 8
>> test-error [bitmap_t::lowest_index]: 24 7
>> test-error [bitmap_t::lowest_index]: 25 6
>> test-error [bitmap_t::lowest_index]: 26 5
>> test-error [bitmap_t::lowest_index]: 27 4
>> test-error [bitmap_t::lowest_index]: 28 3
>> test-error [bitmap_t::lowest_index]: 29 2
>> test-error [bitmap_t::lowest_index]: 30 1
>> test-error [bitmap_t::lowest_index]: 31 0
>> test-error [bitmap_t::lowest_index]: 32 63
>> test-error [bitmap_t::lowest_index]: 33 62
>> test-error [bitmap_t::lowest_index]: 34 61
>> test-error [bitmap_t::lowest_index]: 35 60
>> test-error [bitmap_t::lowest_index]: 36 59
>> test-error [bitmap_t::lowest_index]: 37 58
>> test-error [bitmap_t::lowest_index]: 38 57
>> test-error [bitmap_t::lowest_index]: 39 56
>> test-error [bitmap_t::lowest_index]: 40 55
>> test-error [bitmap_t::lowest_index]: 41 54
>> test-error [bitmap_t::lowest_index]: 42 53
>> test-error [bitmap_t::lowest_index]: 43 52
>> test-error [bitmap_t::lowest_index]: 44 51
>> test-error [bitmap_t::lowest_index]: 45 50
>> test-error [bitmap_t::lowest_index]: 46 49
>> test-error [bitmap_t::lowest_index]: 47 48
>> test-error [bitmap_t::lowest_index]: 48 47
>> test-error [bitmap_t::lowest_index]: 49 46
>> test-error [bitmap_t::lowest_index]: 50 45
>> test-error [bitmap_t::lowest_index]: 51 44
>> test-error [bitmap_t::lowest_index]: 52 43
>> test-error [bitmap_t::lowest_index]: 53 42
>> test-error [bitmap_t::lowest_index]: 54 41
>> test-error [bitmap_t::lowest_index]: 55 40
>> test-error [bitmap_t::lowest_index]: 56 39
>> test-error [bitmap_t::lowest_index]: 57 38
>> test-error [bitmap_t::lowest_index]: 58 37
>> test-error [bitmap_t::lowest_index]: 59 36
>> test-error [bitmap_t::lowest_index]: 60 35
>> test-error [bitmap_t::lowest_index]: 61 34
>> test-error [bitmap_t::lowest_index]: 62 33
>> test-error [bitmap_t::lowest_index]: 63 32
>> test-error [bitmap_t::lowest_index]: 64 95
>> test-error [bitmap_t::lowest_index]: 65 94
>> test-error [bitmap_t::lowest_index]: 66 93
>> test-error [bitmap_t::lowest_index]: 67 92
>> test-error [bitmap_t::lowest_index]: 68 91
>> test-error [bitmap_t::lowest_index]: 69 90
>> test-error [bitmap_t::lowest_index]: 70 89
>> test-error [bitmap_t::lowest_index]: 71 88
>> test-error [bitmap_t::lowest_index]: 72 87
>> test-error [bitmap_t::lowest_index]: 73 86
>> test-error [bitmap_t::lowest_index]: 74 85
>> test-error [bitmap_t::lowest_index]: 75 84
>> test-error [bitmap_t::lowest_index]: 76 83
>> test-error [bitmap_t::lowest_index]: 77 82
>> test-error [bitmap_t::lowest_index]: 78 81

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

That doesn't seem to have any effect on the results.

René van de Veerdonk wrote:

Hi Micheal,

Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64, 
great, three birds with one stone!). So much for portability. But, 
hooray for test-cases!


This may be related to the __lzcnt intrinsic, giving you a completely 
different result, i.e., 32 - my result. Could you try replacing this 
function in bitmap.cpp? I believe I tested this alternative during 
development and it generates only slightly worse performing code.


__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (&m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (&n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things 
to try!


On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams 
mailto:michaelwilliam...@gmail.com>> wrote:


(AMD Phenom CPU)


Michael Williams wrote:

It came through fine the first time.  Gmail probably hid the
duplicate text from you for convenience since it saw that it was
text that you already sent out. Or something.

I can compile the original (9x9) bitmap Go files in Windows 7
x64, but when I run it I get this:

test-error [bitmap_t::lowest_index]: 0 31
test-error [bitmap_t::lowest_index]: 1 30
test-error [bitmap_t::lowest_index]: 2 29
test-error [bitmap_t::lowest_index]: 3 28
test-error [bitmap_t::lowest_index]: 4 27
test-error [bitmap_t::lowest_index]: 5 26
test-error [bitmap_t::lowest_index]: 6 25
test-error [bitmap_t::lowest_index]: 7 24
test-error [bitmap_t::lowest_index]: 8 23
test-error [bitmap_t::lowest_index]: 9 22
test-error [bitmap_t::lowest_index]: 10 21
test-error [bitmap_t::lowest_index]: 11 20
test-error [bitmap_t::lowest_index]: 12 19
test-error [bitmap_t::lowest_index]: 13 18
test-error [bitmap_t::lowest_index]: 14 17
test-error [bitmap_t::lowest_index]: 15 16
test-error [bitmap_t::lowest_index]: 16 15
test-error [bitmap_t::lowest_index]: 17 14
test-error [bitmap_t::lowest_index]: 18 13
test-error [bitmap_t::lowest_index]: 19 12
test-error [bitmap_t::lowest_index]: 20 11
test-error [bitmap_t::lowest_index]: 21 10
test-error [bitmap_t::lowest_index]: 22 9
test-error [bitmap_t::lowest_index]: 23 8
test-error [bitmap_t::lowest_index]: 24 7
test-error [bitmap_t::lowest_index]: 25 6
test-error [bitmap_t::lowest_index]: 26 5
test-error [bitmap_t::lowest_index]: 27 4
test-error [bitmap_t::lowest_index]: 28 3
test-error [bitmap_t::lowest_index]: 29 2
test-error [bitmap_t::lowest_index]: 30 1
test-error [bitmap_t::lowest_index]: 31 0
test-error [bitmap_t::lowest_index]: 32 63
test-error [bitmap_t::lowest_index]: 33 62
test-error [bitmap_t::lowest_index]: 34 61
test-error [bitmap_t::lowest_index]: 35 60
test-error [bitmap_t::lowest_index]: 36 59
test-error [bitmap_t::lowest_index]: 37 58
test-error [bitmap_t::lowest_index]: 38 57
test-error [bitmap_t::lowest_index]: 39 56
test-error [bitmap_t::lowest_index]: 40 55
test-error [bitmap_t::lowest_index]: 41 54
test-error [bitmap_t::lowest_index]: 42 53
test-error [bitmap_t::lowest_index]: 43 52
test-error [bitmap_t::lowest_index]: 44 51
test-error [bitmap_t::lowest_index]: 45 50
test-error [bitmap_t::lowest_index]: 46 49
test-error [bitmap_t::lowest_index]: 47 48
test-error [bitmap_t::lowest_index]: 48 47
test-error [bitmap_t::lowest_index]: 49 46
test-error [bitmap_t::lowest_index]: 50 45
test-error [bitmap_t::lowest_index]: 51 44
test-error [bitmap_t::lowest_index]: 52 43
test-error [bitmap_t::lowest_index]: 53 42
test-error [bitmap_t::lowest_index]: 54 41
test-error [bitmap_t::lowest_index]: 55 40
test-error [bitmap_t::lowest_index]: 56 39
test-error [bitmap_t::lowest_index]: 57 38
test-error [bitmap_t::lowest_index]: 58 37
test-error [bitmap_t::lowest_index]: 59 36
test-error [bitmap_t::lowest_index]: 60 35
test-error [bitmap_t::lowest_index]: 61 34
test-error [bitmap_t::lowest_index]: 62 33
test-error [bitmap_t::lowest_index]: 63 32
test-error [bitmap_t::lowest_index]: 64 95
test-error [bitmap_t::lowest_index]: 65 94
test-error [bitmap_t::lowest_index]: 66 93
test-error [bitmap_t::lowest_index]: 67 92
test-error [bitmap_t::lowest_index]

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Zach,
I can't help the windowsiness, but thanks for taking a look already.

Popcounting I believe to be optimal for 32-bit only instructions, especially
since I would like to keep the intermediate byte-count to help speed up the
random pick function. There is a really neat trick to speed up the final
accumulations, but I couldn't find the sse2 instructions to implement it.
Bitscanning is indeed awkward the way I implemented it, but I didn't see a
better way to do it as there is no compare instruction that yields a boolean
as the result. I would be interested to see what you come up with for
shifting, I believe that this is where a lot of time is spend.

René

On Tue, Aug 25, 2009 at 6:20 PM, Zach Wegner  wrote:

> On an initial look, it seems that the shifting, popcounting, and
> bitscanning functions can be improved significantly. That's all I
> looked at closely, perhaps the other board routines could use some
> reworking too.
>
> I'll try my hand at optimizing it if I get the time, but the windowsy
> code makes it a bit difficult.
>
> Zach
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Don Dailey
A couple of notes.   Some of us on this computer-go forum are also Chess
programmers and many of write 64 bit chess programs.   I am one of them but
I know there are others.

My 64 bit chess program runs almost 2X faster if I compile it to run on a 64
bit OS - in other words I'm using real 64 bit values to represent the bit
boards.So this really is a big deal - you can expect a pretty good gain.

I also toyed with a bit board style go program and started working on one
long ago.   You still need several 64 bit words to represent the state of
each  color.I started to write some routines that did shift, popcount,
etc as fast as possible.  Basically you had to compile the program for each
board size, and it figured out how many bits were needed and simulated
(using struct) an integer data type that was large enough.   About half way
through I decided not to proceed - at the time I had some idea that I was
more interested in and I never picked it back up - assuming that it would be
too slow (but I didn't prove it.)

I think you could look at glaurung (the chess program) to find great
routines for the common bit operations.  I think the author of Glaurung does
it about as fast as it can be done and even has assembly code for finding
the right-most bit,  or something like that. But you could probably
learn something from looking at that program.

I'm not sure how useful this might be for go, but there are some pretty
clever tricks with minimal perfect hashing - that you might find some use
for.   Perhaps there is a way to make this work for eye-detection or
someting. There is some great explanations here:
http://chessprogramming.wikispaces.com/I think you want to look under
move generation perhaps. Perhaps this cannot be used for GO, but it's
fun to read anyway :-) The only issue of course is that you need
more than 64 bits for go - but the general principles might be made to work.


- Don



2009/8/25 René van de Veerdonk 

> Antoine,
> I apologize for misrepresenting your feelings. What I wanted to convey is
> that your e-mail expressed that with the same amount of computation, other
> implementation strategies provide more information, so the information
> gained for the effort put in is disappointing. In other words, bitmap-go had
> better be much faster to make up for the lack of information gained ... and
> it wasn't. That's pretty much what you are saying as well, I believe. As are
> the others in this thread. I think we can all agree on this.
>
> Going into this project, I was well aware that I was going to end up with
> light playouts only, and that heavy playouts is the way to go except perhaps
> for some special purposes such as ladder readouts. I was also well aware of
> the scaling argument. But, I hadn't thought that this would be the first
> thing thrown at my first post, as there are many programmers that seem to be
> stuck at 9x9. Nonetheless, bitmap-go as a topic keeps resurfacing on this
> mailing list every once in a while and nobody ever put solid data and a
> reference implementation behind it. That is what I wanted to accomplish with
> my mockup.
>
> Confession #2: I have been working on my own MCTS implementation using the
> standard implementation methods that almost everybody else is using. But
> there is a never-ending laundry-list of things that I would like to include
> before I would reach reasonable strength (where reasonable is a moving
> target). In the mean time, there are many others that have demonstrated much
> more capable programmers than I ever will be. So, by providing this
> bitmap-go mockup at least I had the feeling that I was contributing
> something newsworthy to the conversation. This may have never happened if I
> would wait until my MCTS program is ready. I imagine that there are others
> on this list in a similar situation. Besides, this was an interesting
> side-project and perhaps someone else can benefit from it (go for it
> Brian!). And, yes, it was fun.
>
> Okay, enough mesmerizing, on to the main topic.
>
> David, Lukasz,
>
> I did modify my mockup to do 19x19 bitmap-go (attached). It is a hardcoded
> solution using arrays of three 128-bit variables. I did not even attempt to
> optimize this version, so this is not the best possible solution.
> Nonetheless, here is a comparison:
>
> Example output (9x9):
> =
>
> [game] = 30236.1, [moves] = 111.071
> [game] = 30249.7, [moves] = 111.068
> [game] = 30145.7, [moves] = 111.089
> [game] = 30237.7, [moves] = 111.122
> [game] = 30210.1, [moves] = 111.101
>
> [game] = 78.0488 kpps, [moves] = 111.023
> [game] = 78.0488 kpps, [moves] = 111.045
> [game] = 78.0488 kpps, [moves] = 111.046
> [game] = 79.0123 kpps, [moves] = 111.131
> [game] = 78.0488 kpps, [moves] = 111.082
>
> [legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44, [win] 0.4187
> [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40, [win] 0.4201
> [legal] 111/52, [pick] 111/75, [play] 106/168, [

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread René van de Veerdonk
Mark,
I would argue that 10x vs 5x is not a deal-breaker if bitmaps would support
your particular goals better. They do provide the advantage of automatically
providing you all the liberties of strings, for instance. Requests on how to
do that efficiently come by on this mailing list on a regular basis. But I
agree that it does not look very good in general.

As for merging being time consuming. The merge portion of the play function
looks like this:

  /* merge friendly strings */
  bool single_stone_string = true;
  do {
bitmap_t adj_strings = bm_adjacent & bm_my_points;
while (!adj_strings.is_empty ()) {
  unsigned int i = adj_strings.lowest_index ();
  do {
i = anchor[i];
  } while (i != anchor[i]);
  anchor[i] = move;
  single_stone_string = false;
  adj_strings.clear (string[i]);
  bm_string.set (string[i]);
  bm_liberty.set (liberty[i]);
}
bm_liberty.clear (bm_move);
atari.clear (bm_string);
  } while (false);

The difference between 9x9 and 19x19 is in the average length of the linked
list anchor[]. The merge itself should be the same work, other than the fact
that for 19x19 I now have to perform the work on three elements per bitmap
as compared to just one for 9x9. Moving away from a linked list may prove
beneficial for 19x19, but than you need to ensure that all the anchors for a
string are correctly set during merge. That means looping over all the
stones in the string, which I do not believe to be very efficient with
bitmaps as it requires cumbersome bit-scanning.

René

On Tue, Aug 25, 2009 at 6:09 PM, Mark Boon  wrote:

> 2009/8/25 René van de Veerdonk :
> > Nonetheless, bitmap-go as a topic keeps resurfacing on this
> > mailing list every once in a while and nobody ever put solid data and a
> > reference implementation behind it. That is what I wanted to accomplish
> with
> > my mockup.
>
> I think this is interesting information, so I think it's good you
> tried it. It basically means unless someone has a much smarter idea
> how to implement bit-sets, the performance is not enough to go through
> the trouble for most programmers.
>
> > So, it looks that I was overly pessimistic in terms of performance drop
> per
> > move, which is a factor of 2.4x (with little effort to optimize for
> 19x19).
> > But, with 4.1x more moves, this still resulted in a 10x speed penalty.
> With
> > the provided reference numbers, Libego only drops 4.5-5.0x, indicating
> that
> > there is almost no performance loss per move (1.2x). Is this difference
> > roughly in line with others expectation?
>
> Yes, I'd expect that. The main speed-determining factor in traditional
> implementations is merging. That depends on the average length of the
> merge. Obviously this can be much longer on 19x19 as a worst case, but
> on average I wouldn't expect it to make much difference compared to
> other board-sizes.
>
> Mark
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

Debug build:
testing _BitScanForward (&m, 0): 0 842032512
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 840277132
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Release build:
testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 4
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Sucks when they produce different results.


René van de Veerdonk wrote:

Micheal,

Thanks for trying. Could you try one or two more things?

(1) Replace the _BitScanForward in the new code to _BitScanReverse. This 
probably won't help.


(2) Add this to bitmapgo.cpp:

/* sandbox */
const bool sandbox = true;
void test_sandbox () {
  unsigned long m = 220;
  unsigned char test_it = _BitScanForward (&m, 0);
  std::cout << "testing _BitScanForward (&m, 0): "
<< (int) test_it << " " << m << "\n";
  test_it = _BitScanForward (&m, 1);
  std::cout << "testing _BitScanForward (&m, 1): "
<< (int) test_it << " " << m << "\n";
  test_it = _BitScanForward (&m, 256);
  std::cout << "testing _BitScanForward (&m, 256): "
<< (int) test_it << " " << m << "\n";
  test_it = _BitScanReverse (&m, 0);
  std::cout << "testing _BitScanReverse (&m, 0): "
<< (int) test_it << " " << m << "\n";
  test_it = _BitScanReverse (&m, 1);
  std::cout << "testing _BitScanReverse (&m, 1): "
<< (int) test_it << " " << m << "\n";
  test_it = _BitScanReverse (&m, 256);
  std::cout << "testing _BitScanReverse (&m, 256): "
<< (int) test_it << " " << m << "\n";
}

and add a line something like this in the main program:
  if (test && sandbox) test_sandbox (); /* new line */
  if (test && verify) test = test_bitmap_t::test (); /* already there */

That does not solve anything, but it may tell me if those functions are 
doing for you what they do for me.


My output is:
testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 3432104
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Thanks, I really appreciate your help !

René

On Tue, Aug 25, 2009 at 6:38 PM, Michael Williams 
mailto:michaelwilliam...@gmail.com>> wrote:


That doesn't seem to have any effect on the results.


René van de Veerdonk wrote:

Hi Micheal,

Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32
-> x64, great, three birds with one stone!). So much for
portability. But, hooray for test-cases!

This may be related to the __lzcnt intrinsic, giving you a
completely different result, i.e., 32 - my result. Could you try
replacing this function in bitmap.cpp? I believe I tested this
alternative during development and it generates only slightly
worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
 const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
 unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
 if (!_BitScanForward (&m, m)) return 127;
 /* offending instruction */
 //return 8*m + __lzcnt (board.m128i_u32[m/4]);
 /* new code follows */
 unsigned long n;
 _BitScanForward (&n, board.m128i_u32[m/4]);
 return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many
things to try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams
mailto:michaelwilliam...@gmail.com>
>> wrote:

   (AMD Phenom CPU)


   Michael Williams wrote:

   It came through fine the first time.  Gmail probably hid the
   duplicate text from you for convenience since it saw that
it was
   text that you already sent out. Or something.

   I can compile the original (9x9) bitmap Go files in Windows 7
   x64, but when I run it I get this:

   test-error [bitmap_t::lowest_index]: 0 31
   test-error [bitmap_t::lowest_index]: 1 30
   test-error [bitmap_t::lowest_index]: 2 29
   test-error [bitmap_t::lowest_index]: 3 28
   test-error [bitmap_t::lowest_index]: 4 27
   test-error [bitmap_t::lowest_index]: 5 26
   test-error [bitmap_t::lowest_index]: 6 25
   test-error [bitmap_t::lowest_index]: 7 24
   test-error [bitmap_t::lowest_index]: 8 23
   test-error [bitmap_t::lowest_index]: 9 22
   test-error [bitmap_t::lowest_index]: 10 21
   test-error [bitmap_t::lowest_index]: 11 20
   test-error [bitmap_t::lowest_index]: 12 19
   test-error [bitmap_t::lowe

Re: [computer-go] Bitmap Go revisited and mockup implementation

2009-08-25 Thread Michael Williams

No.  The code is now:

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0x;
  if (!_BitScanForward (&m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (&n, board.m128i_u32[m/4]);
  return 8*m + (32-n);
}


Debug:
testing _BitScanForward (&m, 0): 0 1099227377
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 1103042705
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32

Release:
testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 4
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8
test-error [bitmap_t::lowest_index]: 0 32


René van de Veerdonk wrote:

Micheal,

Your output is actually correct. The output is undefined when the second 
argument to _BitScanForward is zero and that is what I am explicitly 
testing for in the method bitmap_t::lowest_index, which is supposed to 
return an index to any set bit (despite the name). Your output shows 
that it returns a number that is off in a very predictable way (it seems 
that replacing "8*m + n" with "8*m + (32-n)" on the last line may do the 
trick for you, but it breaks my local copy). So, now I am positively 
baffled.


Help?

René

On Tue, Aug 25, 2009 at 7:25 PM, Michael Williams 
mailto:michaelwilliam...@gmail.com>> wrote:


Debug build:
testing _BitScanForward (&m, 0): 0 842032512

testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 840277132

testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Release build:

testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 4

testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Sucks when they produce different results.



René van de Veerdonk wrote:

Micheal,

Thanks for trying. Could you try one or two more things?

(1) Replace the _BitScanForward in the new code to
_BitScanReverse. This probably won't help.

(2) Add this to bitmapgo.cpp:

/* sandbox */
const bool sandbox = true;
void test_sandbox () {
 unsigned long m = 220;
 unsigned char test_it = _BitScanForward (&m, 0);
 std::cout << "testing _BitScanForward (&m, 0): "
   << (int) test_it << " " << m << "\n";
 test_it = _BitScanForward (&m, 1);
 std::cout << "testing _BitScanForward (&m, 1): "
   << (int) test_it << " " << m << "\n";
 test_it = _BitScanForward (&m, 256);
 std::cout << "testing _BitScanForward (&m, 256): "
   << (int) test_it << " " << m << "\n";
 test_it = _BitScanReverse (&m, 0);
 std::cout << "testing _BitScanReverse (&m, 0): "
   << (int) test_it << " " << m << "\n";
 test_it = _BitScanReverse (&m, 1);
 std::cout << "testing _BitScanReverse (&m, 1): "
   << (int) test_it << " " << m << "\n";
 test_it = _BitScanReverse (&m, 256);
 std::cout << "testing _BitScanReverse (&m, 256): "
   << (int) test_it << " " << m << "\n";
}

and add a line something like this in the main program:
 if (test && sandbox) test_sandbox (); /* new line */
 if (test && verify) test = test_bitmap_t::test (); /* already
there */

That does not solve anything, but it may tell me if those
functions are doing for you what they do for me.

My output is:
testing _BitScanForward (&m, 0): 0 16843009
testing _BitScanForward (&m, 1): 1 0
testing _BitScanForward (&m, 256): 1 8
testing _BitScanReverse (&m, 0): 0 3432104
testing _BitScanReverse (&m, 1): 1 0
testing _BitScanReverse (&m, 256): 1 8

Thanks, I really appreciate your help !

René

On Tue, Aug 25, 2009 at 6:38 PM, Michael Williams
mailto:michaelwilliam...@gmail.com>
>> wrote:

   That doesn't seem to have any effect on the results.


   René van de Veerdonk wrote:

   Hi Micheal,

   Thanks for the test (Intel -> AMD, Windows XP -> Windows
7, x32
   -> x64, great, three birds with one stone!). So much for
   portability. But, hooray for test-cases!

   This may be related to the __lzcnt intrinsic, giving you a
   completely d