Hi Micheal,

Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64,
great, three birds with one stone!). So much for portability. But, hooray
for test-cases!

This may be related to the __lzcnt intrinsic, giving you a completely
different result, i.e., 32 - my result. Could you try replacing this
function in bitmap.cpp? I believe I tested this alternative during
development and it generates only slightly worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0xffff;
  if (!_BitScanForward (&m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (&n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things to
try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams <
michaelwilliam...@gmail.com> wrote:

> (AMD Phenom CPU)
>
>
> Michael Williams wrote:
>
>> It came through fine the first time.  Gmail probably hid the duplicate
>> text from you for convenience since it saw that it was text that you already
>> sent out. Or something.
>>
>> I can compile the original (9x9) bitmap Go files in Windows 7 x64, but
>> when I run it I get this:
>>
>> test-error [bitmap_t::lowest_index]: 0 31
>> test-error [bitmap_t::lowest_index]: 1 30
>> test-error [bitmap_t::lowest_index]: 2 29
>> test-error [bitmap_t::lowest_index]: 3 28
>> test-error [bitmap_t::lowest_index]: 4 27
>> test-error [bitmap_t::lowest_index]: 5 26
>> test-error [bitmap_t::lowest_index]: 6 25
>> test-error [bitmap_t::lowest_index]: 7 24
>> test-error [bitmap_t::lowest_index]: 8 23
>> test-error [bitmap_t::lowest_index]: 9 22
>> test-error [bitmap_t::lowest_index]: 10 21
>> test-error [bitmap_t::lowest_index]: 11 20
>> test-error [bitmap_t::lowest_index]: 12 19
>> test-error [bitmap_t::lowest_index]: 13 18
>> test-error [bitmap_t::lowest_index]: 14 17
>> test-error [bitmap_t::lowest_index]: 15 16
>> test-error [bitmap_t::lowest_index]: 16 15
>> test-error [bitmap_t::lowest_index]: 17 14
>> test-error [bitmap_t::lowest_index]: 18 13
>> test-error [bitmap_t::lowest_index]: 19 12
>> test-error [bitmap_t::lowest_index]: 20 11
>> test-error [bitmap_t::lowest_index]: 21 10
>> test-error [bitmap_t::lowest_index]: 22 9
>> test-error [bitmap_t::lowest_index]: 23 8
>> test-error [bitmap_t::lowest_index]: 24 7
>> test-error [bitmap_t::lowest_index]: 25 6
>> test-error [bitmap_t::lowest_index]: 26 5
>> test-error [bitmap_t::lowest_index]: 27 4
>> test-error [bitmap_t::lowest_index]: 28 3
>> test-error [bitmap_t::lowest_index]: 29 2
>> test-error [bitmap_t::lowest_index]: 30 1
>> test-error [bitmap_t::lowest_index]: 31 0
>> test-error [bitmap_t::lowest_index]: 32 63
>> test-error [bitmap_t::lowest_index]: 33 62
>> test-error [bitmap_t::lowest_index]: 34 61
>> test-error [bitmap_t::lowest_index]: 35 60
>> test-error [bitmap_t::lowest_index]: 36 59
>> test-error [bitmap_t::lowest_index]: 37 58
>> test-error [bitmap_t::lowest_index]: 38 57
>> test-error [bitmap_t::lowest_index]: 39 56
>> test-error [bitmap_t::lowest_index]: 40 55
>> test-error [bitmap_t::lowest_index]: 41 54
>> test-error [bitmap_t::lowest_index]: 42 53
>> test-error [bitmap_t::lowest_index]: 43 52
>> test-error [bitmap_t::lowest_index]: 44 51
>> test-error [bitmap_t::lowest_index]: 45 50
>> test-error [bitmap_t::lowest_index]: 46 49
>> test-error [bitmap_t::lowest_index]: 47 48
>> test-error [bitmap_t::lowest_index]: 48 47
>> test-error [bitmap_t::lowest_index]: 49 46
>> test-error [bitmap_t::lowest_index]: 50 45
>> test-error [bitmap_t::lowest_index]: 51 44
>> test-error [bitmap_t::lowest_index]: 52 43
>> test-error [bitmap_t::lowest_index]: 53 42
>> test-error [bitmap_t::lowest_index]: 54 41
>> test-error [bitmap_t::lowest_index]: 55 40
>> test-error [bitmap_t::lowest_index]: 56 39
>> test-error [bitmap_t::lowest_index]: 57 38
>> test-error [bitmap_t::lowest_index]: 58 37
>> test-error [bitmap_t::lowest_index]: 59 36
>> test-error [bitmap_t::lowest_index]: 60 35
>> test-error [bitmap_t::lowest_index]: 61 34
>> test-error [bitmap_t::lowest_index]: 62 33
>> test-error [bitmap_t::lowest_index]: 63 32
>> test-error [bitmap_t::lowest_index]: 64 95
>> test-error [bitmap_t::lowest_index]: 65 94
>> test-error [bitmap_t::lowest_index]: 66 93
>> test-error [bitmap_t::lowest_index]: 67 92
>> test-error [bitmap_t::lowest_index]: 68 91
>> test-error [bitmap_t::lowest_index]: 69 90
>> test-error [bitmap_t::lowest_index]: 70 89
>> test-error [bitmap_t::lowest_index]: 71 88
>> test-error [bitmap_t::lowest_index]: 72 87
>> test-error [bitmap_t::lowest_index]: 73 86
>> test-error [bitmap_t::lowest_index]: 74 85
>> test-error [bitmap_t::lowest_index]: 75 84
>> test-error [bitmap_t::lowest_index]: 76 83
>> test-error [bitmap_t::lowest_index]: 77 82
>> test-error [bitmap_t::lowest_index]: 78 81
>> test-error [bitmap_t::lowest_index]: 79 80
>> test-error [bitmap_t::lowest_index]: 80 79
>> test-error-1 [bitmap_t::is_one_bit]: 0
>> test-error-1 [bitmap_t::is_one_bit]: 1
>> test-error-1 [bitmap_t::is_one_bit]: 2
>> test-error-1 [bitmap_t::is_one_bit]: 3
>> test-error-1 [bitmap_t::is_one_bit]: 4
>> test-error-1 [bitmap_t::is_one_bit]: 5
>> test-error-1 [bitmap_t::is_one_bit]: 6
>> test-error-1 [bitmap_t::is_one_bit]: 7
>> test-error-1 [bitmap_t::is_one_bit]: 8
>> test-error-1 [bitmap_t::is_one_bit]: 9
>> test-error-1 [bitmap_t::is_one_bit]: 10
>> test-error-1 [bitmap_t::is_one_bit]: 11
>> test-error-1 [bitmap_t::is_one_bit]: 12
>> test-error-1 [bitmap_t::is_one_bit]: 13
>> test-error-1 [bitmap_t::is_one_bit]: 14
>> test-error-1 [bitmap_t::is_one_bit]: 15
>> test-error-1 [bitmap_t::is_one_bit]: 16
>> test-error-1 [bitmap_t::is_one_bit]: 17
>> test-error-1 [bitmap_t::is_one_bit]: 18
>> test-error-1 [bitmap_t::is_one_bit]: 19
>> test-error-1 [bitmap_t::is_one_bit]: 20
>> test-error-1 [bitmap_t::is_one_bit]: 21
>> test-error-1 [bitmap_t::is_one_bit]: 22
>> test-error-1 [bitmap_t::is_one_bit]: 23
>> test-error-1 [bitmap_t::is_one_bit]: 24
>> test-error-1 [bitmap_t::is_one_bit]: 25
>> test-error-1 [bitmap_t::is_one_bit]: 26
>> test-error-1 [bitmap_t::is_one_bit]: 27
>> test-error-1 [bitmap_t::is_one_bit]: 28
>> test-error-1 [bitmap_t::is_one_bit]: 29
>> test-error-1 [bitmap_t::is_one_bit]: 30
>> test-error-1 [bitmap_t::is_one_bit]: 31
>> test-error-1 [bitmap_t::is_one_bit]: 32
>> test-error-1 [bitmap_t::is_one_bit]: 33
>> test-error-1 [bitmap_t::is_one_bit]: 34
>> test-error-1 [bitmap_t::is_one_bit]: 35
>> test-error-1 [bitmap_t::is_one_bit]: 36
>> test-error-1 [bitmap_t::is_one_bit]: 37
>> test-error-1 [bitmap_t::is_one_bit]: 38
>> test-error-1 [bitmap_t::is_one_bit]: 39
>> test-error-1 [bitmap_t::is_one_bit]: 40
>> test-error-1 [bitmap_t::is_one_bit]: 41
>> test-error-1 [bitmap_t::is_one_bit]: 42
>> test-error-1 [bitmap_t::is_one_bit]: 43
>> test-error-1 [bitmap_t::is_one_bit]: 44
>> test-error-1 [bitmap_t::is_one_bit]: 45
>> test-error-1 [bitmap_t::is_one_bit]: 46
>> test-error-1 [bitmap_t::is_one_bit]: 47
>> test-error-1 [bitmap_t::is_one_bit]: 48
>> test-error-1 [bitmap_t::is_one_bit]: 49
>> test-error-1 [bitmap_t::is_one_bit]: 50
>> test-error-1 [bitmap_t::is_one_bit]: 51
>> test-error-1 [bitmap_t::is_one_bit]: 52
>> test-error-1 [bitmap_t::is_one_bit]: 53
>> test-error-1 [bitmap_t::is_one_bit]: 54
>> test-error-1 [bitmap_t::is_one_bit]: 55
>> test-error-1 [bitmap_t::is_one_bit]: 56
>> test-error-1 [bitmap_t::is_one_bit]: 57
>> test-error-1 [bitmap_t::is_one_bit]: 58
>> test-error-1 [bitmap_t::is_one_bit]: 59
>> test-error-1 [bitmap_t::is_one_bit]: 60
>> test-error-1 [bitmap_t::is_one_bit]: 61
>> test-error-1 [bitmap_t::is_one_bit]: 62
>> test-error-1 [bitmap_t::is_one_bit]: 63
>> test-error-1 [bitmap_t::is_one_bit]: 64
>> test-error-1 [bitmap_t::is_one_bit]: 65
>> test-error-1 [bitmap_t::is_one_bit]: 66
>> test-error-1 [bitmap_t::is_one_bit]: 67
>> test-error-1 [bitmap_t::is_one_bit]: 68
>> test-error-1 [bitmap_t::is_one_bit]: 69
>> test-error-1 [bitmap_t::is_one_bit]: 70
>> test-error-1 [bitmap_t::is_one_bit]: 71
>> test-error-1 [bitmap_t::is_one_bit]: 72
>> test-error-1 [bitmap_t::is_one_bit]: 73
>> test-error-1 [bitmap_t::is_one_bit]: 74
>> test-error-1 [bitmap_t::is_one_bit]: 75
>> test-error-1 [bitmap_t::is_one_bit]: 76
>> test-error-1 [bitmap_t::is_one_bit]: 77
>> test-error-1 [bitmap_t::is_one_bit]: 78
>> test-error-1 [bitmap_t::is_one_bit]: 79
>> test-error-1 [bitmap_t::is_one_bit]: 80
>>
>> And then it crashes.
>>
>>
>> René van de Veerdonk wrote:
>>
>>> [I do not know what happens, but this is the second time the computer-go
>>> archives miss the body. Maybe its the attachment. I apologize for sending
>>> this a second time, this time without attachment. If someone knows what the
>>> problem is ... I am using Gmail]
>>>
>>> Antoine,
>>>
>>> I apologize for misrepresenting your feelings. What I wanted to convey is
>>> that your e-mail expressed that with the same amount of computation, other
>>> implementation strategies provide more information, so the information
>>> gained for the effort put in is disappointing. In other words, bitmap-go had
>>> better be much faster to make up for the lack of information gained ... and
>>> it wasn't. That's pretty much what you are saying as well, I believe. As are
>>> the others in this thread. I think we can all agree on this.
>>>
>>> Going into this project, I was well aware that I was going to end up with
>>> light playouts only, and that heavy playouts is the way to go except perhaps
>>> for some special purposes such as ladder readouts. I was also well aware of
>>> the scaling argument. But, I hadn't thought that this would be the first
>>> thing thrown at my first post, as there are many programmers that seem to be
>>> stuck at 9x9. Nonetheless, bitmap-go as a topic keeps resurfacing on this
>>> mailing list every once in a while and nobody ever put solid data and a
>>> reference implementation behind it. That is what I wanted to accomplish with
>>> my mockup.
>>>
>>> Confession #2: I have been working on my own MCTS implementation using
>>> the standard implementation methods that almost everybody else is using. But
>>> there is a never-ending laundry-list of things that I would like to include
>>> before I would reach reasonable strength (where reasonable is a moving
>>> target). In the mean time, there are many others that have demonstrated much
>>> more capable programmers than I ever will be. So, by providing this
>>> bitmap-go mockup at least I had the feeling that I was contributing
>>> something newsworthy to the conversation. This may have never happened if I
>>> would wait until my MCTS program is ready. I imagine that there are others
>>> on this list in a similar situation. Besides, this was an interesting
>>> side-project and perhaps someone else can benefit from it (go for it
>>> Brian!). And, yes, it was fun.
>>>
>>> Okay, enough mesmerizing, on to the main topic.
>>>
>>> David, Lukasz,
>>>
>>> I did modify my mockup to do 19x19 bitmap-go (attached). It is a
>>> hardcoded solution using arrays of three 128-bit variables. I did not even
>>> attempt to optimize this version, so this is not the best possible solution.
>>> Nonetheless, here is a comparison:
>>>
>>> Example output (9x9):
>>> =====================
>>>
>>> [game] = 30236.1, [moves] = 111.071
>>> [game] = 30249.7, [moves] = 111.068
>>> [game] = 30145.7, [moves] = 111.089
>>> [game] = 30237.7, [moves] = 111.122
>>> [game] = 30210.1, [moves] = 111.101
>>>
>>> [game] = 78.0488 kpps, [moves] = 111.023
>>> [game] = 78.0488 kpps, [moves] = 111.045
>>> [game] = 78.0488 kpps, [moves] = 111.046
>>> [game] = 79.0123 kpps, [moves] = 111.131
>>> [game] = 78.0488 kpps, [moves] = 111.082
>>>
>>> [legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44, [win] 0.4187
>>> [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40, [win] 0.4201
>>> [legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42, [win] 0.4276
>>> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 41, [win] 0.4092
>>> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 44, [win] 0.4221
>>>
>>> Example output (19x19):
>>> =======================
>>>
>>> [game] = 316452, [moves] = 455.036
>>> [game] = 316378, [moves] = 455.126
>>> [game] = 316266, [moves] = 455.177
>>> [game] = 316017, [moves] = 455.08
>>> [game] = 316210, [moves] = 455.16
>>>
>>> [game] = 7.45052 kpps, [moves] = 455.267
>>> [game] = 7.45921 kpps, [moves] = 455.055
>>> [game] = 7.45921 kpps, [moves] = 455.17
>>> [game] = 7.45921 kpps, [moves] = 455.188
>>> [game] = 7.47664 kpps, [moves] = 455.013
>>>
>>> [legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win]
>>> 0.4592
>>> [legal] 455/144, [pick] 455/128, [play] 449/431, [score] 173, [win]
>>> 0.4655
>>> [legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win]
>>> 0.4611
>>> [legal] 454/144, [pick] 454/128, [play] 449/431, [score] 173, [win]
>>> 0.4674
>>> [legal] 455/144, [pick] 455/128, [play] 450/430, [score] 175, [win]
>>> 0.4661
>>>
>>> Summary:
>>> ========
>>>
>>> function        9x9   19x19   ratio
>>> -----------------------------------
>>> game  [ cc ]  30216  316265   10.47
>>> game  [kpps]     78    7.46   10.45
>>> moves           111     455    4.10
>>> legal [ cc ]     52     144    2.77
>>> pick  [ cc ]     74     128    1.73
>>> play  [ cc ]    168     430    2.56
>>> score [ cc ]     42     173    4.12
>>>
>>> legal+pick+play 294     702    2.39
>>>
>>> So, it looks that I was overly pessimistic in terms of performance drop
>>> per move, which is a factor of 2.4x (with little effort to optimize for
>>> 19x19). But, with 4.1x more moves, this still resulted in a 10x speed
>>> penalty. With the provided reference numbers, Libego only drops 4.5-5.0x,
>>> indicating that there is almost no performance loss per move (1.2x). Is this
>>> difference roughly in line with others expectation?
>>>
>>> With that, I hope this at least provides some data for future
>>> discussions, and reference implementation for others that may want to pick
>>> up the ball from here. Someone could add a gtp-interface and make it
>>> available as one of Don's reference implementations for starting
>>> go-programmers. I would still be interested to improve this solution, but I
>>> think I need a fresh set of eyes in order to make any more progress. I would
>>> like to re-iterate that I expect significant gains on a 64-bit OS as a
>>> result on the doubling of the number of available registers. It would be
>>> nice if someone would be able to post some results. Newer CPU's (will) also
>>> have instructions available that can significantly speed things up
>>> (bit-test, compare, popcount, wider variables, etc).
>>>
>>> René van de Veerdonk
>>>
>>> On Tue, Aug 25, 2009 at 12:42 PM, Antoine de Maricourt <
>>> antoine.de-marico...@m4x.org <mailto:antoine.de-marico...@m4x.org>>
>>> wrote:
>>>
>>>    Hi René,
>>>
>>>        David,
>>>
>>>        Confession: I have not tested 19x19. As you have noted, and
>>>        others before you over the years, a 19x19 board does not fit in
>>>        one but three 128-bit registers and there would be a rather big
>>>        penalty as a result, perhaps (likely?) wiping out all of the
>>>        benefits of bitmaps. Antoine voiced his disappointment about the
>>>        speed advantage even on 9x9 in the e-mail to the list that
>>>        served as my basis. My hope, as Hideko Kato points out, is in
>>>        longer registers or perhaps being able to port this to a GPU. A
>>>        64-bit OS with twice the number registers would also surely help
>>>        out. Nevertheless, I was surprised that I was able to get to
>>>        almost the same level of speed as Libego provides.
>>>
>>>
>>>    As far as I remember, I was not disappointed by the speed itself on
>>>    9x9 boards, but mainly with the following 2 points:
>>>
>>>    a) my feeling is that, as you say, it does not scale very well on
>>>    19x19 (on current hardware).
>>>    I don't think other "classical" implementations suffer such a big
>>>    penalty when scaling from 9x9 to 19x19.
>>>
>>>    b) I also felt that this kind of implementation was not very flexible.
>>>    For instance, I had another classical implementation, running at
>>>    equivalent speed, but in which local 3x3 pattern matching was almost
>>>    for free, as well as other more elaborated information.
>>>    When I started to think about introducing 3x3 local patterns in the
>>>    bitmap only implementation, it was clear it would not be for free.
>>>
>>>    At that time, my conclusion was that if one only needs pure random
>>>    play with no intelligence at all during playouts, then bitmap
>>>    implementation could compete (at least on 9x9).
>>>    If one needs more elaborated knowledge (such as small patterns
>>>    matching, or even knowledge about blocks of stones), then pure
>>>    bitmap implementation is probably not so competitive.
>>>    I thus gave up with the idea and jumped to more promising ones.
>>>
>>>    Anyway, I'm glad my post has been usefull to you. And I encourage
>>>    you to improve your implementation and let us know, especially if
>>>    you have fun. Starting with something and playing with it is a good
>>>    way to have new ideas, even if this makes your initial ones look
>>>    less interesting a while after.
>>>
>>>    Best regards,
>>>
>>>       Antoine
>>>
>>>
>>>    _______________________________________________
>>>    computer-go mailing list
>>>    computer-go@computer-go.org <mailto:computer-go@computer-go.org>
>>>    http://www.computer-go.org/mailman/listinfo/computer-go/
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> computer-go mailing list
>>> computer-go@computer-go.org
>>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>>
>>
>>
>>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to