Hi Micheal, Thanks for the test (Intel -> AMD, Windows XP -> Windows 7, x32 -> x64, great, three birds with one stone!). So much for portability. But, hooray for test-cases!
This may be related to the __lzcnt intrinsic, giving you a completely different result, i.e., 32 - my result. Could you try replacing this function in bitmap.cpp? I believe I tested this alternative during development and it generates only slightly worse performing code. __forceinline unsigned int bitmap_t::lowest_index () const { const __m128i comp = _mm_cmpeq_epi32 (_mm_setzero_si128 (), board); unsigned long m = _mm_movemask_epi8 (comp) ^ 0xffff; if (!_BitScanForward (&m, m)) return 127; /* offending instruction */ //return 8*m + __lzcnt (board.m128i_u32[m/4]); /* new code follows */ unsigned long n; _BitScanForward (&n, board.m128i_u32[m/4]); return 8*m + n; } René PS. x64 ... new playground, you could use __lzcnt64 ... so many things to try! On Tue, Aug 25, 2009 at 5:43 PM, Michael Williams < michaelwilliam...@gmail.com> wrote: > (AMD Phenom CPU) > > > Michael Williams wrote: > >> It came through fine the first time. Gmail probably hid the duplicate >> text from you for convenience since it saw that it was text that you already >> sent out. Or something. >> >> I can compile the original (9x9) bitmap Go files in Windows 7 x64, but >> when I run it I get this: >> >> test-error [bitmap_t::lowest_index]: 0 31 >> test-error [bitmap_t::lowest_index]: 1 30 >> test-error [bitmap_t::lowest_index]: 2 29 >> test-error [bitmap_t::lowest_index]: 3 28 >> test-error [bitmap_t::lowest_index]: 4 27 >> test-error [bitmap_t::lowest_index]: 5 26 >> test-error [bitmap_t::lowest_index]: 6 25 >> test-error [bitmap_t::lowest_index]: 7 24 >> test-error [bitmap_t::lowest_index]: 8 23 >> test-error [bitmap_t::lowest_index]: 9 22 >> test-error [bitmap_t::lowest_index]: 10 21 >> test-error [bitmap_t::lowest_index]: 11 20 >> test-error [bitmap_t::lowest_index]: 12 19 >> test-error [bitmap_t::lowest_index]: 13 18 >> test-error [bitmap_t::lowest_index]: 14 17 >> test-error [bitmap_t::lowest_index]: 15 16 >> test-error [bitmap_t::lowest_index]: 16 15 >> test-error [bitmap_t::lowest_index]: 17 14 >> test-error [bitmap_t::lowest_index]: 18 13 >> test-error [bitmap_t::lowest_index]: 19 12 >> test-error [bitmap_t::lowest_index]: 20 11 >> test-error [bitmap_t::lowest_index]: 21 10 >> test-error [bitmap_t::lowest_index]: 22 9 >> test-error [bitmap_t::lowest_index]: 23 8 >> test-error [bitmap_t::lowest_index]: 24 7 >> test-error [bitmap_t::lowest_index]: 25 6 >> test-error [bitmap_t::lowest_index]: 26 5 >> test-error [bitmap_t::lowest_index]: 27 4 >> test-error [bitmap_t::lowest_index]: 28 3 >> test-error [bitmap_t::lowest_index]: 29 2 >> test-error [bitmap_t::lowest_index]: 30 1 >> test-error [bitmap_t::lowest_index]: 31 0 >> test-error [bitmap_t::lowest_index]: 32 63 >> test-error [bitmap_t::lowest_index]: 33 62 >> test-error [bitmap_t::lowest_index]: 34 61 >> test-error [bitmap_t::lowest_index]: 35 60 >> test-error [bitmap_t::lowest_index]: 36 59 >> test-error [bitmap_t::lowest_index]: 37 58 >> test-error [bitmap_t::lowest_index]: 38 57 >> test-error [bitmap_t::lowest_index]: 39 56 >> test-error [bitmap_t::lowest_index]: 40 55 >> test-error [bitmap_t::lowest_index]: 41 54 >> test-error [bitmap_t::lowest_index]: 42 53 >> test-error [bitmap_t::lowest_index]: 43 52 >> test-error [bitmap_t::lowest_index]: 44 51 >> test-error [bitmap_t::lowest_index]: 45 50 >> test-error [bitmap_t::lowest_index]: 46 49 >> test-error [bitmap_t::lowest_index]: 47 48 >> test-error [bitmap_t::lowest_index]: 48 47 >> test-error [bitmap_t::lowest_index]: 49 46 >> test-error [bitmap_t::lowest_index]: 50 45 >> test-error [bitmap_t::lowest_index]: 51 44 >> test-error [bitmap_t::lowest_index]: 52 43 >> test-error [bitmap_t::lowest_index]: 53 42 >> test-error [bitmap_t::lowest_index]: 54 41 >> test-error [bitmap_t::lowest_index]: 55 40 >> test-error [bitmap_t::lowest_index]: 56 39 >> test-error [bitmap_t::lowest_index]: 57 38 >> test-error [bitmap_t::lowest_index]: 58 37 >> test-error [bitmap_t::lowest_index]: 59 36 >> test-error [bitmap_t::lowest_index]: 60 35 >> test-error [bitmap_t::lowest_index]: 61 34 >> test-error [bitmap_t::lowest_index]: 62 33 >> test-error [bitmap_t::lowest_index]: 63 32 >> test-error [bitmap_t::lowest_index]: 64 95 >> test-error [bitmap_t::lowest_index]: 65 94 >> test-error [bitmap_t::lowest_index]: 66 93 >> test-error [bitmap_t::lowest_index]: 67 92 >> test-error [bitmap_t::lowest_index]: 68 91 >> test-error [bitmap_t::lowest_index]: 69 90 >> test-error [bitmap_t::lowest_index]: 70 89 >> test-error [bitmap_t::lowest_index]: 71 88 >> test-error [bitmap_t::lowest_index]: 72 87 >> test-error [bitmap_t::lowest_index]: 73 86 >> test-error [bitmap_t::lowest_index]: 74 85 >> test-error [bitmap_t::lowest_index]: 75 84 >> test-error [bitmap_t::lowest_index]: 76 83 >> test-error [bitmap_t::lowest_index]: 77 82 >> test-error [bitmap_t::lowest_index]: 78 81 >> test-error [bitmap_t::lowest_index]: 79 80 >> test-error [bitmap_t::lowest_index]: 80 79 >> test-error-1 [bitmap_t::is_one_bit]: 0 >> test-error-1 [bitmap_t::is_one_bit]: 1 >> test-error-1 [bitmap_t::is_one_bit]: 2 >> test-error-1 [bitmap_t::is_one_bit]: 3 >> test-error-1 [bitmap_t::is_one_bit]: 4 >> test-error-1 [bitmap_t::is_one_bit]: 5 >> test-error-1 [bitmap_t::is_one_bit]: 6 >> test-error-1 [bitmap_t::is_one_bit]: 7 >> test-error-1 [bitmap_t::is_one_bit]: 8 >> test-error-1 [bitmap_t::is_one_bit]: 9 >> test-error-1 [bitmap_t::is_one_bit]: 10 >> test-error-1 [bitmap_t::is_one_bit]: 11 >> test-error-1 [bitmap_t::is_one_bit]: 12 >> test-error-1 [bitmap_t::is_one_bit]: 13 >> test-error-1 [bitmap_t::is_one_bit]: 14 >> test-error-1 [bitmap_t::is_one_bit]: 15 >> test-error-1 [bitmap_t::is_one_bit]: 16 >> test-error-1 [bitmap_t::is_one_bit]: 17 >> test-error-1 [bitmap_t::is_one_bit]: 18 >> test-error-1 [bitmap_t::is_one_bit]: 19 >> test-error-1 [bitmap_t::is_one_bit]: 20 >> test-error-1 [bitmap_t::is_one_bit]: 21 >> test-error-1 [bitmap_t::is_one_bit]: 22 >> test-error-1 [bitmap_t::is_one_bit]: 23 >> test-error-1 [bitmap_t::is_one_bit]: 24 >> test-error-1 [bitmap_t::is_one_bit]: 25 >> test-error-1 [bitmap_t::is_one_bit]: 26 >> test-error-1 [bitmap_t::is_one_bit]: 27 >> test-error-1 [bitmap_t::is_one_bit]: 28 >> test-error-1 [bitmap_t::is_one_bit]: 29 >> test-error-1 [bitmap_t::is_one_bit]: 30 >> test-error-1 [bitmap_t::is_one_bit]: 31 >> test-error-1 [bitmap_t::is_one_bit]: 32 >> test-error-1 [bitmap_t::is_one_bit]: 33 >> test-error-1 [bitmap_t::is_one_bit]: 34 >> test-error-1 [bitmap_t::is_one_bit]: 35 >> test-error-1 [bitmap_t::is_one_bit]: 36 >> test-error-1 [bitmap_t::is_one_bit]: 37 >> test-error-1 [bitmap_t::is_one_bit]: 38 >> test-error-1 [bitmap_t::is_one_bit]: 39 >> test-error-1 [bitmap_t::is_one_bit]: 40 >> test-error-1 [bitmap_t::is_one_bit]: 41 >> test-error-1 [bitmap_t::is_one_bit]: 42 >> test-error-1 [bitmap_t::is_one_bit]: 43 >> test-error-1 [bitmap_t::is_one_bit]: 44 >> test-error-1 [bitmap_t::is_one_bit]: 45 >> test-error-1 [bitmap_t::is_one_bit]: 46 >> test-error-1 [bitmap_t::is_one_bit]: 47 >> test-error-1 [bitmap_t::is_one_bit]: 48 >> test-error-1 [bitmap_t::is_one_bit]: 49 >> test-error-1 [bitmap_t::is_one_bit]: 50 >> test-error-1 [bitmap_t::is_one_bit]: 51 >> test-error-1 [bitmap_t::is_one_bit]: 52 >> test-error-1 [bitmap_t::is_one_bit]: 53 >> test-error-1 [bitmap_t::is_one_bit]: 54 >> test-error-1 [bitmap_t::is_one_bit]: 55 >> test-error-1 [bitmap_t::is_one_bit]: 56 >> test-error-1 [bitmap_t::is_one_bit]: 57 >> test-error-1 [bitmap_t::is_one_bit]: 58 >> test-error-1 [bitmap_t::is_one_bit]: 59 >> test-error-1 [bitmap_t::is_one_bit]: 60 >> test-error-1 [bitmap_t::is_one_bit]: 61 >> test-error-1 [bitmap_t::is_one_bit]: 62 >> test-error-1 [bitmap_t::is_one_bit]: 63 >> test-error-1 [bitmap_t::is_one_bit]: 64 >> test-error-1 [bitmap_t::is_one_bit]: 65 >> test-error-1 [bitmap_t::is_one_bit]: 66 >> test-error-1 [bitmap_t::is_one_bit]: 67 >> test-error-1 [bitmap_t::is_one_bit]: 68 >> test-error-1 [bitmap_t::is_one_bit]: 69 >> test-error-1 [bitmap_t::is_one_bit]: 70 >> test-error-1 [bitmap_t::is_one_bit]: 71 >> test-error-1 [bitmap_t::is_one_bit]: 72 >> test-error-1 [bitmap_t::is_one_bit]: 73 >> test-error-1 [bitmap_t::is_one_bit]: 74 >> test-error-1 [bitmap_t::is_one_bit]: 75 >> test-error-1 [bitmap_t::is_one_bit]: 76 >> test-error-1 [bitmap_t::is_one_bit]: 77 >> test-error-1 [bitmap_t::is_one_bit]: 78 >> test-error-1 [bitmap_t::is_one_bit]: 79 >> test-error-1 [bitmap_t::is_one_bit]: 80 >> >> And then it crashes. >> >> >> René van de Veerdonk wrote: >> >>> [I do not know what happens, but this is the second time the computer-go >>> archives miss the body. Maybe its the attachment. I apologize for sending >>> this a second time, this time without attachment. If someone knows what the >>> problem is ... I am using Gmail] >>> >>> Antoine, >>> >>> I apologize for misrepresenting your feelings. What I wanted to convey is >>> that your e-mail expressed that with the same amount of computation, other >>> implementation strategies provide more information, so the information >>> gained for the effort put in is disappointing. In other words, bitmap-go had >>> better be much faster to make up for the lack of information gained ... and >>> it wasn't. That's pretty much what you are saying as well, I believe. As are >>> the others in this thread. I think we can all agree on this. >>> >>> Going into this project, I was well aware that I was going to end up with >>> light playouts only, and that heavy playouts is the way to go except perhaps >>> for some special purposes such as ladder readouts. I was also well aware of >>> the scaling argument. But, I hadn't thought that this would be the first >>> thing thrown at my first post, as there are many programmers that seem to be >>> stuck at 9x9. Nonetheless, bitmap-go as a topic keeps resurfacing on this >>> mailing list every once in a while and nobody ever put solid data and a >>> reference implementation behind it. That is what I wanted to accomplish with >>> my mockup. >>> >>> Confession #2: I have been working on my own MCTS implementation using >>> the standard implementation methods that almost everybody else is using. But >>> there is a never-ending laundry-list of things that I would like to include >>> before I would reach reasonable strength (where reasonable is a moving >>> target). In the mean time, there are many others that have demonstrated much >>> more capable programmers than I ever will be. So, by providing this >>> bitmap-go mockup at least I had the feeling that I was contributing >>> something newsworthy to the conversation. This may have never happened if I >>> would wait until my MCTS program is ready. I imagine that there are others >>> on this list in a similar situation. Besides, this was an interesting >>> side-project and perhaps someone else can benefit from it (go for it >>> Brian!). And, yes, it was fun. >>> >>> Okay, enough mesmerizing, on to the main topic. >>> >>> David, Lukasz, >>> >>> I did modify my mockup to do 19x19 bitmap-go (attached). It is a >>> hardcoded solution using arrays of three 128-bit variables. I did not even >>> attempt to optimize this version, so this is not the best possible solution. >>> Nonetheless, here is a comparison: >>> >>> Example output (9x9): >>> ===================== >>> >>> [game] = 30236.1, [moves] = 111.071 >>> [game] = 30249.7, [moves] = 111.068 >>> [game] = 30145.7, [moves] = 111.089 >>> [game] = 30237.7, [moves] = 111.122 >>> [game] = 30210.1, [moves] = 111.101 >>> >>> [game] = 78.0488 kpps, [moves] = 111.023 >>> [game] = 78.0488 kpps, [moves] = 111.045 >>> [game] = 78.0488 kpps, [moves] = 111.046 >>> [game] = 79.0123 kpps, [moves] = 111.131 >>> [game] = 78.0488 kpps, [moves] = 111.082 >>> >>> [legal] 110/51, [pick] 110/74, [play] 106/168, [score] 44, [win] 0.4187 >>> [legal] 111/51, [pick] 111/74, [play] 106/168, [score] 40, [win] 0.4201 >>> [legal] 111/52, [pick] 111/75, [play] 106/168, [score] 42, [win] 0.4276 >>> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 41, [win] 0.4092 >>> [legal] 111/52, [pick] 111/74, [play] 106/169, [score] 44, [win] 0.4221 >>> >>> Example output (19x19): >>> ======================= >>> >>> [game] = 316452, [moves] = 455.036 >>> [game] = 316378, [moves] = 455.126 >>> [game] = 316266, [moves] = 455.177 >>> [game] = 316017, [moves] = 455.08 >>> [game] = 316210, [moves] = 455.16 >>> >>> [game] = 7.45052 kpps, [moves] = 455.267 >>> [game] = 7.45921 kpps, [moves] = 455.055 >>> [game] = 7.45921 kpps, [moves] = 455.17 >>> [game] = 7.45921 kpps, [moves] = 455.188 >>> [game] = 7.47664 kpps, [moves] = 455.013 >>> >>> [legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win] >>> 0.4592 >>> [legal] 455/144, [pick] 455/128, [play] 449/431, [score] 173, [win] >>> 0.4655 >>> [legal] 454/144, [pick] 454/128, [play] 449/430, [score] 173, [win] >>> 0.4611 >>> [legal] 454/144, [pick] 454/128, [play] 449/431, [score] 173, [win] >>> 0.4674 >>> [legal] 455/144, [pick] 455/128, [play] 450/430, [score] 175, [win] >>> 0.4661 >>> >>> Summary: >>> ======== >>> >>> function 9x9 19x19 ratio >>> ----------------------------------- >>> game [ cc ] 30216 316265 10.47 >>> game [kpps] 78 7.46 10.45 >>> moves 111 455 4.10 >>> legal [ cc ] 52 144 2.77 >>> pick [ cc ] 74 128 1.73 >>> play [ cc ] 168 430 2.56 >>> score [ cc ] 42 173 4.12 >>> >>> legal+pick+play 294 702 2.39 >>> >>> So, it looks that I was overly pessimistic in terms of performance drop >>> per move, which is a factor of 2.4x (with little effort to optimize for >>> 19x19). But, with 4.1x more moves, this still resulted in a 10x speed >>> penalty. With the provided reference numbers, Libego only drops 4.5-5.0x, >>> indicating that there is almost no performance loss per move (1.2x). Is this >>> difference roughly in line with others expectation? >>> >>> With that, I hope this at least provides some data for future >>> discussions, and reference implementation for others that may want to pick >>> up the ball from here. Someone could add a gtp-interface and make it >>> available as one of Don's reference implementations for starting >>> go-programmers. I would still be interested to improve this solution, but I >>> think I need a fresh set of eyes in order to make any more progress. I would >>> like to re-iterate that I expect significant gains on a 64-bit OS as a >>> result on the doubling of the number of available registers. It would be >>> nice if someone would be able to post some results. Newer CPU's (will) also >>> have instructions available that can significantly speed things up >>> (bit-test, compare, popcount, wider variables, etc). >>> >>> René van de Veerdonk >>> >>> On Tue, Aug 25, 2009 at 12:42 PM, Antoine de Maricourt < >>> antoine.de-marico...@m4x.org <mailto:antoine.de-marico...@m4x.org>> >>> wrote: >>> >>> Hi René, >>> >>> David, >>> >>> Confession: I have not tested 19x19. As you have noted, and >>> others before you over the years, a 19x19 board does not fit in >>> one but three 128-bit registers and there would be a rather big >>> penalty as a result, perhaps (likely?) wiping out all of the >>> benefits of bitmaps. Antoine voiced his disappointment about the >>> speed advantage even on 9x9 in the e-mail to the list that >>> served as my basis. My hope, as Hideko Kato points out, is in >>> longer registers or perhaps being able to port this to a GPU. A >>> 64-bit OS with twice the number registers would also surely help >>> out. Nevertheless, I was surprised that I was able to get to >>> almost the same level of speed as Libego provides. >>> >>> >>> As far as I remember, I was not disappointed by the speed itself on >>> 9x9 boards, but mainly with the following 2 points: >>> >>> a) my feeling is that, as you say, it does not scale very well on >>> 19x19 (on current hardware). >>> I don't think other "classical" implementations suffer such a big >>> penalty when scaling from 9x9 to 19x19. >>> >>> b) I also felt that this kind of implementation was not very flexible. >>> For instance, I had another classical implementation, running at >>> equivalent speed, but in which local 3x3 pattern matching was almost >>> for free, as well as other more elaborated information. >>> When I started to think about introducing 3x3 local patterns in the >>> bitmap only implementation, it was clear it would not be for free. >>> >>> At that time, my conclusion was that if one only needs pure random >>> play with no intelligence at all during playouts, then bitmap >>> implementation could compete (at least on 9x9). >>> If one needs more elaborated knowledge (such as small patterns >>> matching, or even knowledge about blocks of stones), then pure >>> bitmap implementation is probably not so competitive. >>> I thus gave up with the idea and jumped to more promising ones. >>> >>> Anyway, I'm glad my post has been usefull to you. And I encourage >>> you to improve your implementation and let us know, especially if >>> you have fun. Starting with something and playing with it is a good >>> way to have new ideas, even if this makes your initial ones look >>> less interesting a while after. >>> >>> Best regards, >>> >>> Antoine >>> >>> >>> _______________________________________________ >>> computer-go mailing list >>> computer-go@computer-go.org <mailto:computer-go@computer-go.org> >>> http://www.computer-go.org/mailman/listinfo/computer-go/ >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> computer-go mailing list >>> computer-go@computer-go.org >>> http://www.computer-go.org/mailman/listinfo/computer-go/ >>> >> >> >> > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/