Mark Boon wrote:

What I have now takes 10-15 microseconds on a 2Ghz Intel computer to find all the patterns on a board (on average for 9x9, for 19x19 it's more like 15-20 usec)

From your difference between 9x9 and 19x19 I assume that you are updating
the patterns of the cells after a stone was placed, (else the difference should be 361/81 times) not a computation from scratch. I make this sure so that we compare apples to apples. As far as the patterns computing is
concerned, (i.e. excluding the board part verifying captures, etc.) for
a pattern of 40 neighbors I get times easily below 500 nanoseconds (even on an old P4 3.4 GHz) for updating 40 hashes. I explained that about June last year, when I wrote it. Since it passed my tests (June 07) I have never changed a character in the code that writes the 67090 asm lines. It is just a black box, that works 100%. Each board cell has an updating function that knows where the board limits are, the resulting
code (for hash 32 bit mode) is something like an endless:

          lea     ebx, [edx + ecx*2 - SizeOf_bccCell]
          mov     eax, 0967f6762h
          bt      dword ptr [ebx], bidx_StoneBit
          cmovc   eax, esi
          xor     [ebx + 4], eax
          lea     ebx, [edx + ecx*2]
          mov     eax, 0d83b6518h
          bt      dword ptr [ebx], bidx_StoneBit
          cmovc   eax, esi
          xor     [ebx + 4], eax
          lea     ebx, [edx + ecx*2 + SizeOf_bccCell]
          mov     eax, 0121001f7h
          bt      dword ptr [ebx], bidx_StoneBit
          cmovc   eax, esi
          xor     [ebx + 4], eax

The longest of these functions has 206 lines, the typical about 120. There
is not a single conditional jump, no vars in RAM, etc.


It's written in Java so making it in C would possibly give a two-fold
speedup.

I hate the Java war but, when I said that it is taking a 10 year handicap, at least for patterns that is true with no doubt. My code running on 1998 hardware (it is compatible) outperforms Java code for updating patterns.


Somehow smaller sets don't go much faster, but larger sets do slow down, every ten-fold increase in number of patterns seems to double the time.

What I wrote above updates the hashes (or the masks, because the board has many modes) but does not search the hash to get urgency. For searching
in a sorted list, I use the second best language after automated asm ..

.. manually written asm, of course ;-)

;; One iteration:
;; --------------
CompStep        MACRO   name

                mov     eax, ebx
                add     eax, ecx
                shr     eax, 1
                and     al,  0fch       ;; Align to a valid pointer
                cmp     edx, [eax]
                jz      &name&out
                cmovc   ecx, eax        ;; This is true if [eax] > value
                cmovnc  ebx, eax        ;; This is true if the previous is false

                ENDM

Also (almost) without conditional jumps. The only conditional jump is selected
once when the hash is found and exits. 10 steps of that can search in a 1024
long list, 20 steps in 1024^2. A doubling in table size is just adding one step,
(8 instructions, say 64 clock cycles).

I hope this helps.

Jacques.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to