Mark Boon wrote:
What I have now takes 10-15 microseconds on a 2Ghz Intel computer to
find all the patterns on a board (on average for 9x9, for 19x19 it's
more like 15-20 usec)
From your difference between 9x9 and 19x19 I assume that you are updating
the patterns of the cells after a stone was placed, (else the difference
should be 361/81 times) not a computation from scratch. I make this sure
so that we compare apples to apples. As far as the patterns computing is
concerned, (i.e. excluding the board part verifying captures, etc.) for
a pattern of 40 neighbors I get times easily below 500 nanoseconds
(even on an old P4 3.4 GHz) for updating 40 hashes. I explained that about
June last year, when I wrote it. Since it passed my tests (June 07) I
have never changed a character in the code that writes the 67090 asm
lines. It is just a black box, that works 100%. Each board cell has an
updating function that knows where the board limits are, the resulting
code (for hash 32 bit mode) is something like an endless:
lea ebx, [edx + ecx*2 - SizeOf_bccCell]
mov eax, 0967f6762h
bt dword ptr [ebx], bidx_StoneBit
cmovc eax, esi
xor [ebx + 4], eax
lea ebx, [edx + ecx*2]
mov eax, 0d83b6518h
bt dword ptr [ebx], bidx_StoneBit
cmovc eax, esi
xor [ebx + 4], eax
lea ebx, [edx + ecx*2 + SizeOf_bccCell]
mov eax, 0121001f7h
bt dword ptr [ebx], bidx_StoneBit
cmovc eax, esi
xor [ebx + 4], eax
The longest of these functions has 206 lines, the typical about 120. There
is not a single conditional jump, no vars in RAM, etc.
It's written in Java so making it in C would possibly give a two-fold
speedup.
I hate the Java war but, when I said that it is taking a 10 year handicap,
at least for patterns that is true with no doubt. My code running on 1998
hardware (it is compatible) outperforms Java code for updating patterns.
Somehow smaller sets don't go much faster, but larger sets do slow down,
every ten-fold increase in number of patterns seems to double the time.
What I wrote above updates the hashes (or the masks, because the board
has many modes) but does not search the hash to get urgency. For searching
in a sorted list, I use the second best language after automated asm ..
.. manually written asm, of course ;-)
;; One iteration:
;; --------------
CompStep MACRO name
mov eax, ebx
add eax, ecx
shr eax, 1
and al, 0fch ;; Align to a valid pointer
cmp edx, [eax]
jz &name&out
cmovc ecx, eax ;; This is true if [eax] > value
cmovnc ebx, eax ;; This is true if the previous is false
ENDM
Also (almost) without conditional jumps. The only conditional jump is selected
once when the hash is found and exits. 10 steps of that can search in a 1024
long list, 20 steps in 1024^2. A doubling in table size is just adding one step,
(8 instructions, say 64 clock cycles).
I hope this helps.
Jacques.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/