That doesn't seem to have any effect on the results.

René van de Veerdonk wrote:
Hi Micheal,

Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64, great, three birds with one stone!). So much for portability. But, hooray for test-cases!

This may be related to the __lzcnt intrinsic, giving you a completely different result, i.e., 32 - my result. Could you try replacing this function in bitmap.cpp? I believe I tested this alternative during development and it generates only slightly worse performing code.

__forceinline
unsigned int bitmap_t::lowest_index () const {
  const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board);
  unsigned long m = _mm_movemask_epi8 (comp) ^ 0xffff;
  if (!_BitScanForward (&m, m)) return 127;
  /* offending instruction */
  //return 8*m + __lzcnt (board.m128i_u32[m/4]);
  /* new code follows */
  unsigned long n;
  _BitScanForward (&n, board.m128i_u32[m/4]);
  return 8*m + n;
}

René

PS. x64 ... new playground, you could use __lzcnt64 ... so many things to try!

On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams <michaelwilliam...@gmail.com <mailto:michaelwilliam...@gmail.com>> wrote:

    (AMD Phenom CPU)


    Michael Williams wrote:

        It came through fine the first time.  Gmail probably hid the
        duplicate text from you for convenience since it saw that it was
        text that you already sent out. Or something.

        I can compile the original (9x9) bitmap Go files in Windows 7
        x64, but when I run it I get this:

        test-error [bitmap_t::lowest_index]: 0 31
        test-error [bitmap_t::lowest_index]: 1 30
        test-error [bitmap_t::lowest_index]: 2 29
        test-error [bitmap_t::lowest_index]: 3 28
        test-error [bitmap_t::lowest_index]: 4 27
        test-error [bitmap_t::lowest_index]: 5 26
        test-error [bitmap_t::lowest_index]: 6 25
        test-error [bitmap_t::lowest_index]: 7 24
        test-error [bitmap_t::lowest_index]: 8 23
        test-error [bitmap_t::lowest_index]: 9 22
        test-error [bitmap_t::lowest_index]: 10 21
        test-error [bitmap_t::lowest_index]: 11 20
        test-error [bitmap_t::lowest_index]: 12 19
        test-error [bitmap_t::lowest_index]: 13 18
        test-error [bitmap_t::lowest_index]: 14 17
        test-error [bitmap_t::lowest_index]: 15 16
        test-error [bitmap_t::lowest_index]: 16 15
        test-error [bitmap_t::lowest_index]: 17 14
        test-error [bitmap_t::lowest_index]: 18 13
        test-error [bitmap_t::lowest_index]: 19 12
        test-error [bitmap_t::lowest_index]: 20 11
        test-error [bitmap_t::lowest_index]: 21 10
        test-error [bitmap_t::lowest_index]: 22 9
        test-error [bitmap_t::lowest_index]: 23 8
        test-error [bitmap_t::lowest_index]: 24 7
        test-error [bitmap_t::lowest_index]: 25 6
        test-error [bitmap_t::lowest_index]: 26 5
        test-error [bitmap_t::lowest_index]: 27 4
        test-error [bitmap_t::lowest_index]: 28 3
        test-error [bitmap_t::lowest_index]: 29 2
        test-error [bitmap_t::lowest_index]: 30 1
        test-error [bitmap_t::lowest_index]: 31 0
        test-error [bitmap_t::lowest_index]: 32 63
        test-error [bitmap_t::lowest_index]: 33 62
        test-error [bitmap_t::lowest_index]: 34 61
        test-error [bitmap_t::lowest_index]: 35 60
        test-error [bitmap_t::lowest_index]: 36 59
        test-error [bitmap_t::lowest_index]: 37 58
        test-error [bitmap_t::lowest_index]: 38 57
        test-error [bitmap_t::lowest_index]: 39 56
        test-error [bitmap_t::lowest_index]: 40 55
        test-error [bitmap_t::lowest_index]: 41 54
        test-error [bitmap_t::lowest_index]: 42 53
        test-error [bitmap_t::lowest_index]: 43 52
        test-error [bitmap_t::lowest_index]: 44 51
        test-error [bitmap_t::lowest_index]: 45 50
        test-error [bitmap_t::lowest_index]: 46 49
        test-error [bitmap_t::lowest_index]: 47 48
        test-error [bitmap_t::lowest_index]: 48 47
        test-error [bitmap_t::lowest_index]: 49 46
        test-error [bitmap_t::lowest_index]: 50 45
        test-error [bitmap_t::lowest_index]: 51 44
        test-error [bitmap_t::lowest_index]: 52 43
        test-error [bitmap_t::lowest_index]: 53 42
        test-error [bitmap_t::lowest_index]: 54 41
        test-error [bitmap_t::lowest_index]: 55 40
        test-error [bitmap_t::lowest_index]: 56 39
        test-error [bitmap_t::lowest_index]: 57 38
        test-error [bitmap_t::lowest_index]: 58 37
        test-error [bitmap_t::lowest_index]: 59 36
        test-error [bitmap_t::lowest_index]: 60 35
        test-error [bitmap_t::lowest_index]: 61 34
        test-error [bitmap_t::lowest_index]: 62 33
        test-error [bitmap_t::lowest_index]: 63 32
        test-error [bitmap_t::lowest_index]: 64 95
        test-error [bitmap_t::lowest_index]: 65 94
        test-error [bitmap_t::lowest_index]: 66 93
        test-error [bitmap_t::lowest_index]: 67 92
        test-error [bitmap_t::lowest_index]: 68 91
        test-error [bitmap_t::lowest_index]: 69 90
        test-error [bitmap_t::lowest_index]: 70 89
        test-error [bitmap_t::lowest_index]: 71 88
        test-error [bitmap_t::lowest_index]: 72 87
        test-error [bitmap_t::lowest_index]: 73 86
        test-error [bitmap_t::lowest_index]: 74 85
        test-error [bitmap_t::lowest_index]: 75 84
        test-error [bitmap_t::lowest_index]: 76 83
        test-error [bitmap_t::lowest_index]: 77 82
        test-error [bitmap_t::lowest_index]: 78 81
        test-error [bitmap_t::lowest_index]: 79 80
        test-error [bitmap_t::lowest_index]: 80 79
        test-error-1 [bitmap_t::is_one_bit]: 0
        test-error-1 [bitmap_t::is_one_bit]: 1
        test-error-1 [bitmap_t::is_one_bit]: 2
        test-error-1 [bitmap_t::is_one_bit]: 3
        test-error-1 [bitmap_t::is_one_bit]: 4
        test-error-1 [bitmap_t::is_one_bit]: 5
        test-error-1 [bitmap_t::is_one_bit]: 6
        test-error-1 [bitmap_t::is_one_bit]: 7
        test-error-1 [bitmap_t::is_one_bit]: 8
        test-error-1 [bitmap_t::is_one_bit]: 9
        test-error-1 [bitmap_t::is_one_bit]: 10
        test-error-1 [bitmap_t::is_one_bit]: 11
        test-error-1 [bitmap_t::is_one_bit]: 12
        test-error-1 [bitmap_t::is_one_bit]: 13
        test-error-1 [bitmap_t::is_one_bit]: 14
        test-error-1 [bitmap_t::is_one_bit]: 15
        test-error-1 [bitmap_t::is_one_bit]: 16
        test-error-1 [bitmap_t::is_one_bit]: 17
        test-error-1 [bitmap_t::is_one_bit]: 18
        test-error-1 [bitmap_t::is_one_bit]: 19
        test-error-1 [bitmap_t::is_one_bit]: 20
        test-error-1 [bitmap_t::is_one_bit]: 21
        test-error-1 [bitmap_t::is_one_bit]: 22
        test-error-1 [bitmap_t::is_one_bit]: 23
        test-error-1 [bitmap_t::is_one_bit]: 24
        test-error-1 [bitmap_t::is_one_bit]: 25
        test-error-1 [bitmap_t::is_one_bit]: 26
        test-error-1 [bitmap_t::is_one_bit]: 27
        test-error-1 [bitmap_t::is_one_bit]: 28
        test-error-1 [bitmap_t::is_one_bit]: 29
        test-error-1 [bitmap_t::is_one_bit]: 30
        test-error-1 [bitmap_t::is_one_bit]: 31
        test-error-1 [bitmap_t::is_one_bit]: 32
        test-error-1 [bitmap_t::is_one_bit]: 33
        test-error-1 [bitmap_t::is_one_bit]: 34
        test-error-1 [bitmap_t::is_one_bit]: 35
        test-error-1 [bitmap_t::is_one_bit]: 36
        test-error-1 [bitmap_t::is_one_bit]: 37
        test-error-1 [bitmap_t::is_one_bit]: 38
        test-error-1 [bitmap_t::is_one_bit]: 39
        test-error-1 [bitmap_t::is_one_bit]: 40
        test-error-1 [bitmap_t::is_one_bit]: 41
        test-error-1 [bitmap_t::is_one_bit]: 42
        test-error-1 [bitmap_t::is_one_bit]: 43
        test-error-1 [bitmap_t::is_one_bit]: 44
        test-error-1 [bitmap_t::is_one_bit]: 45
        test-error-1 [bitmap_t::is_one_bit]: 46
        test-error-1 [bitmap_t::is_one_bit]: 47
        test-error-1 [bitmap_t::is_one_bit]: 48
        test-error-1 [bitmap_t::is_one_bit]: 49
        test-error-1 [bitmap_t::is_one_bit]: 50
        test-error-1 [bitmap_t::is_one_bit]: 51
        test-error-1 [bitmap_t::is_one_bit]: 52
        test-error-1 [bitmap_t::is_one_bit]: 53
        test-error-1 [bitmap_t::is_one_bit]: 54
        test-error-1 [bitmap_t::is_one_bit]: 55
        test-error-1 [bitmap_t::is_one_bit]: 56
        test-error-1 [bitmap_t::is_one_bit]: 57
        test-error-1 [bitmap_t::is_one_bit]: 58
        test-error-1 [bitmap_t::is_one_bit]: 59
        test-error-1 [bitmap_t::is_one_bit]: 60
        test-error-1 [bitmap_t::is_one_bit]: 61
        test-error-1 [bitmap_t::is_one_bit]: 62
        test-error-1 [bitmap_t::is_one_bit]: 63
        test-error-1 [bitmap_t::is_one_bit]: 64
        test-error-1 [bitmap_t::is_one_bit]: 65
        test-error-1 [bitmap_t::is_one_bit]: 66
        test-error-1 [bitmap_t::is_one_bit]: 67
        test-error-1 [bitmap_t::is_one_bit]: 68
        test-error-1 [bitmap_t::is_one_bit]: 69
        test-error-1 [bitmap_t::is_one_bit]: 70
        test-error-1 [bitmap_t::is_one_bit]: 71
        test-error-1 [bitmap_t::is_one_bit]: 72
        test-error-1 [bitmap_t::is_one_bit]: 73
        test-error-1 [bitmap_t::is_one_bit]: 74
        test-error-1 [bitmap_t::is_one_bit]: 75
        test-error-1 [bitmap_t::is_one_bit]: 76
        test-error-1 [bitmap_t::is_one_bit]: 77
        test-error-1 [bitmap_t::is_one_bit]: 78
        test-error-1 [bitmap_t::is_one_bit]: 79
        test-error-1 [bitmap_t::is_one_bit]: 80

        And then it crashes.


        René van de Veerdonk wrote:

            [I do not know what happens, but this is the second time the
            computer-go archives miss the body. Maybe its the
            attachment. I apologize for sending this a second time, this
            time without attachment. If someone knows what the problem
            is ... I am using Gmail]

            Antoine,

            I apologize for misrepresenting your feelings. What I wanted
            to convey is that your e-mail expressed that with the same
            amount of computation, other implementation strategies
            provide more information, so the information gained for the
            effort put in is disappointing. In other words, bitmap-go
            had better be much faster to make up for the lack of
            information gained ... and it wasn't. That's pretty much
            what you are saying as well, I believe. As are the others in
            this thread. I think we can all agree on this.

            Going into this project, I was well aware that I was going
            to end up with light playouts only, and that heavy playouts
            is the way to go except perhaps for some special purposes
            such as ladder readouts. I was also well aware of the
            scaling argument. But, I hadn't thought that this would be
            the first thing thrown at my first post, as there are many
            programmers that seem to be stuck at 9x9. Nonetheless,
            bitmap-go as a topic keeps resurfacing on this mailing list
            every once in a while and nobody ever put solid data and a
            reference implementation behind it. That is what I wanted to
            accomplish with my mockup.

            Confession #2: I have been working on my own MCTS
            implementation using the standard implementation methods
            that almost everybody else is using. But there is a
            never-ending laundry-list of things that I would like to
            include before I would reach reasonable strength (where
            reasonable is a moving target). In the mean time, there are
            many others that have demonstrated much more capable
            programmers than I ever will be. So, by providing this
            bitmap-go mockup at least I had the feeling that I was
            contributing something newsworthy to the conversation. This
            may have never happened if I would wait until my MCTS
            program is ready. I imagine that there are others on this
            list in a similar situation. Besides, this was an
            interesting side-project and perhaps someone else can
            benefit from it (go for it Brian!). And, yes, it was fun.

            Okay, enough mesmerizing, on to the main topic.

            David, Lukasz,

            I did modify my mockup to do 19x19 bitmap-go (attached). It
            is a hardcoded solution using arrays of three 128-bit
            variables. I did not even attempt to optimize this version,
            so this is not the best possible solution. Nonetheless, here
            is a comparison:

            Example output (9x9):
            =====================

            [game] = 30236.1, [moves] = 111.071
            [game] = 30249.7, [moves] = 111.068
            [game] = 30145.7, [moves] = 111.089
            [game] = 30237.7, [moves] = 111.122
            [game] = 30210.1, [moves] = 111.101

            [game] = 78.0488 kpps, [moves] = 111.023
            [game] = 78.0488 kpps, [moves] = 111.045
            [game] = 78.0488 kpps, [moves] = 111.046
            [game] = 79.0123 kpps, [moves] = 111.131
            [game] = 78.0488 kpps, [moves] = 111.082

            [legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44,
            [win] 0.4187
            [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40,
            [win] 0.4201
            [legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42,
            [win] 0.4276
            [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 41,
            [win] 0.4092
            [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 44,
            [win] 0.4221

            Example output (19x19):
            =======================

            [game] = 316452, [moves] = 455.036
            [game] = 316378, [moves] = 455.126
            [game] = 316266, [moves] = 455.177
            [game] = 316017, [moves] = 455.08
            [game] = 316210, [moves] = 455.16

            [game] = 7.45052 kpps, [moves] = 455.267
            [game] = 7.45921 kpps, [moves] = 455.055
            [game] = 7.45921 kpps, [moves] = 455.17
            [game] = 7.45921 kpps, [moves] = 455.188
            [game] = 7.47664 kpps, [moves] = 455.013

            [legal] 454/144, [pick] 454/128, [play] 449/430, [score]
            173, [win] 0.4592
            [legal] 455/144, [pick] 455/128, [play] 449/431, [score]
            173, [win] 0.4655
            [legal] 454/144, [pick] 454/128, [play] 449/430, [score]
            173, [win] 0.4611
            [legal] 454/144, [pick] 454/128, [play] 449/431, [score]
            173, [win] 0.4674
            [legal] 455/144, [pick] 455/128, [play] 450/430, [score]
            175, [win] 0.4661

            Summary:
            ========

            function        9x9   19x19   ratio
            -----------------------------------
            game  [ cc ]  30216  316265   10.47
            game  [kpps]     78    7.46   10.45
            moves           111     455    4.10
            legal [ cc ]     52     144    2.77
            pick  [ cc ]     74     128    1.73
            play  [ cc ]    168     430    2.56
            score [ cc ]     42     173    4.12

            legal+pick+play 294     702    2.39

            So, it looks that I was overly pessimistic in terms of
            performance drop per move, which is a factor of 2.4x (with
            little effort to optimize for 19x19). But, with 4.1x more
            moves, this still resulted in a 10x speed penalty. With the
            provided reference numbers, Libego only drops 4.5-5.0x,
            indicating that there is almost no performance loss per move
            (1.2x). Is this difference roughly in line with others
            expectation?

            With that, I hope this at least provides some data for
            future discussions, and reference implementation for others
            that may want to pick up the ball from here. Someone could
            add a gtp-interface and make it available as one of Don's
            reference implementations for starting go-programmers. I
            would still be interested to improve this solution, but I
            think I need a fresh set of eyes in order to make any more
            progress. I would like to re-iterate that I expect
            significant gains on a 64-bit OS as a result on the doubling
            of the number of available registers. It would be nice if
            someone would be able to post some results. Newer CPU's
            (will) also have instructions available that can
            significantly speed things up (bit-test, compare, popcount,
            wider variables, etc).

            René van de Veerdonk

            On Tue, Aug 25, 2009 at 12:42 PM, Antoine de Maricourt
            <antoine.de-marico...@m4x.org
            <mailto:antoine.de-marico...@m4x.org>
            <mailto:antoine.de-marico...@m4x.org
            <mailto:antoine.de-marico...@m4x.org>>> wrote:

               Hi René,

                   David,

                   Confession: I have not tested 19x19. As you have
            noted, and
                   others before you over the years, a 19x19 board does
            not fit in
                   one but three 128-bit registers and there would be a
            rather big
                   penalty as a result, perhaps (likely?) wiping out all
            of the
                   benefits of bitmaps. Antoine voiced his
            disappointment about the
                   speed advantage even on 9x9 in the e-mail to the list
            that
                   served as my basis. My hope, as Hideko Kato points
            out, is in
                   longer registers or perhaps being able to port this
            to a GPU. A
                   64-bit OS with twice the number registers would also
            surely help
                   out. Nevertheless, I was surprised that I was able to
            get to
                   almost the same level of speed as Libego provides.


               As far as I remember, I was not disappointed by the speed
            itself on
               9x9 boards, but mainly with the following 2 points:

               a) my feeling is that, as you say, it does not scale very
            well on
               19x19 (on current hardware).
               I don't think other "classical" implementations suffer
            such a big
               penalty when scaling from 9x9 to 19x19.

               b) I also felt that this kind of implementation was not
            very flexible.
               For instance, I had another classical implementation,
            running at
               equivalent speed, but in which local 3x3 pattern matching
            was almost
               for free, as well as other more elaborated information.
               When I started to think about introducing 3x3 local
            patterns in the
               bitmap only implementation, it was clear it would not be
            for free.

               At that time, my conclusion was that if one only needs
            pure random
               play with no intelligence at all during playouts, then bitmap
               implementation could compete (at least on 9x9).
               If one needs more elaborated knowledge (such as small
            patterns
               matching, or even knowledge about blocks of stones), then
            pure
               bitmap implementation is probably not so competitive.
               I thus gave up with the idea and jumped to more promising
            ones.

               Anyway, I'm glad my post has been usefull to you. And I
            encourage
               you to improve your implementation and let us know,
            especially if
               you have fun. Starting with something and playing with it
            is a good
               way to have new ideas, even if this makes your initial
            ones look
               less interesting a while after.

               Best regards,

                  Antoine


               _______________________________________________
               computer-go mailing list
               computer-go@computer-go.org
            <mailto:computer-go@computer-go.org>
            <mailto:computer-go@computer-go.org
            <mailto:computer-go@computer-go.org>>
               http://www.computer-go.org/mailman/listinfo/computer-go/




            
------------------------------------------------------------------------

            _______________________________________________
            computer-go mailing list
            computer-go@computer-go.org <mailto:computer-go@computer-go.org>
            http://www.computer-go.org/mailman/listinfo/computer-go/




    _______________________________________________
    computer-go mailing list
    computer-go@computer-go.org <mailto:computer-go@computer-go.org>
    http://www.computer-go.org/mailman/listinfo/computer-go/



------------------------------------------------------------------------

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to